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A FACTORIAL STUDY OF TEACHING ABILITY 


Superior State College 
Superior, Wisconsin 


SECTION I 
INTRODUCTION 


MANY STUDIES of a followup nature have 
been made of graduates of teacher training curric- 
ula to determine the success of these persons 
as teachers. Such studies have been, in one 
manner or another, concerned with the assess- 
ment and measurement of teaching ability. That 
such evaluation is necessary has been pointed 
out many times. Barrl says ‘‘the measurement 
of teaching ability is important: (1) in the train- 
ing of teachers in service; (2) in the institution- 
al training of teachers; (3) in the administration 
of the teaching staff; and (4) in research relat- 
ing to these fields.’’ Teacher training institu- 
tions are confronted with the dual problem of re- 
cruiting and selecting those candidates who seem 
to show the greatest promise of becoming suc- 
cessful teachers and of providing those curric- 
ular and extra-curricular activities which will 
aid these candidates when they have entered the 
teaching profession. Evaluation is important in 
both. 

Recent years have seen the role of the teach- 
er undergo significant changes. Baxter2 points 
out; ‘‘It is no longer enough that the teacher be 
the possessor of knowledge. Today’s teacher 
must be a ‘social engineer’, capable of setting 
up a provocative environment for children’s 
learning, charting the course of each individual 
child through the ever-changing social relation- 
ships in which he is involved and assisting each 
pupil to grow in his understanding of himselfand 
of others. ’’ One can readily realize the impli- 
cations of statements such as this for the re- 
cruitment, selection, training, and placement 





of teachers. 

Many persons have attempted to construct 
instruments of one sort or another that might 
be used in teacher evaluation. As yet, educa- 
tors have not, however, reached agreement 
concerning what constitutes an adequate criter- 
ion of teaching success. In general, research 
studies have employed criteria of the following 
types: (1) a rating assigned by the principal or 
superintendent; (2) a rating by the teacher’s 
supervisor or a visiting supervisor; (3) an eval- 
uation of the teacher by the pupils; and (4) an 
evaluation by the teacher’s peers. Less fre- 
quently a criterion of pupil gain has been used, 
but when this has been utilized the gains have 
been expressed as difference scores obtained 
by subtracting a final test score from a pre-test 
score. Most educators would agree that the 
teacher should provide an atmosphere which 
will foster desirable growth in students, but 
they would not all agree that this growth is to 
be assessed only by achievement tests in sub- 
ject matter areas. On occasion a self-evalua- 
tion has been advocated and used. Often a com- 
posite criterion has been formed by combining 
several of these criteria. 

Another problem confronting researchers is 
that of determining those traits or characteris- 
tics which appear to be related to teaching suc- 
cess. That many of these traits probably lie 
outside the area of subject matter competence 
has been pointed out by Olson? in his comments 
on various studies made of teachers: ‘‘It is in- 
teresting and significant in the various studies 
that knowledge of the subject matter as such 
seems to be taken for granted and that the suc- 
cess or failure of a teacher appears to lie more 
largely in personal qualities and social relation- 





* The author has incurred many obligations in carrying out this study, in particular to Dr. A. S. 
Barr, his major professor, for his constant encouragment, assistance, and leadership throughout 


the course of this investigation. 


1. Arvil S. Barr. "The Measurement of Teaching Ability," Journal of Educational Research, XXVIII 


(April 1935), ppe 561. 





2. B. Baxter. Teacher Pupil Relationships (New York: Macmillan Coe, 1950). 





3. Willard C. Olson. Child Development (Boston: D. C. Heath & Coe, 1949), pe 285. 











ships. ’’ 

That such qualities might be related io the 
temperament or personality of the teachers was 
pointed out in a UNESCO pamphlet:4 ‘Various 
research studies show clearly that the emotion- 
al stability of teachers affects that of pupils. 
What happens to boys and girls in school de - 
pends in large measure on the personal growth 
and development of the teachers with whom they 
have to.work. Unhappy, frustrated, dissatisfied 
teachers cannot help their pupils to become 
happy, well-adjusted young people. ’’ 


Statement of the Problem 





This dissertation is concerned with the rela- 
tionships between supervisory estimates of teach- 
ing success and various measures of teacher 
achievement, temperament, and personality. 

Specifically this study is concerned with de- 
veloping equations for the prediction of teaching 
success with a minimum number of variables 
and maximum stability as suggested by factor 
analysis of a variety of measures of teacher 
characteristics and efficiency. In order to de- 
velop a basis for simplified patterns of predic- 
tion a factorial study is made of the interrela- 
tionships among variables. This is done to de- 
termine factors common to estimates of teach- 
ing success and data-gathering devices common- 
ly employed to measure the achievement, tem- 
perament, and personality of teachers. 

In brief, this study will attempt to answer the 
following questions: 


1. Will a general factor account for the ma- 
trix of intercorrelations of the nine estimates 
of teaching success? 

2. How many comimon factors are necessary 
to account for the intercorrelations of the nine 
estimates of teaching success? 

3. Will estimates of these factors in the form 
of factor scores serve as a basis for separating 
successful teachers from unsuccessful teachers? 

4. Willa grouping of variables and an analysis 
Similar to that used in (2) above, applied to three 
rectangular matrixes of orders seven by nine, 
sixteen by nine, and ten by nine, respectively, 
account for the correlations in these matrices ? 

5. What measures of achievement, tempera- 
ment, and personality are significantly related 
to these factor scores used as estimates of 
teaching success? 
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SECTION II 
DESIGN OF THE STUDY 


Description of the Subjects 


SIXTY-FOUR teachers engaged in their 
second year of teaching in Wisconsin high 
schools were chosen for this study. These six- 
ty-four cases are a sub-group of the total group 
of 101 beginning teachers visited during the be- 
ginning teacher visitation program conducted 
the latter half of the first semester of the 1950- 
1951 school year. Those visited had graduated 
from the University of Wisconsin in 1950 with 
the University Teachers Certificate and had ob- 
tained their first teaching positions in the State 
of Wisconsin in the fall of the year 1950. The 
thirty-seven teachers not included for study 
have left the teaching profession: some married, 
others became members of the armed forces, 
and several went into other vocational areas. 
The sixty-four remaining cases consist of thirty- 
four male and thirty female teachers. 

These teachers represent nineteen fields of 
major preparation. A tabulation of the majors 
and minors will be found in Table I. Twenty- 
eight subject areas are represented for which 
these teachers meet University of Wisconsin 
major or minor requirements or State of Wis- 
consin minor field certification requirements. 
The total frequency of majors is 67 as contrast- 
ed with the 64 teachers. This apparent discrep- 
ancy arises since teachers numbered 15 and 46 
had double majors of History and Economics, 
and teacher No. 47 hada double major of Speech 
and English. 

The mean total grade point average for the 
group of sixty-four teachers studied was 1. 94 
with a standard deviation of .40. This com- 
pares very well with the mean of 1. 88 and stand- 
ard deviation of .38 calculated for 205 (1949) 
school of education graduates and holders of the 
University Teachers Certificate. This group 
does not appear markedly different from the 
1949 graduates in terms of means and standard 
deviations. It cannot be determined whether this 
comparison would hold over a period of years, 
since statistics are not available. However, 
Schmid5 in a study of 102 (1948) graduates re- 
ported a mean of 1.75 anda standard deviation 
of . 40 for total grade point average. 

The mean Total Score Percentile Rank on the 





he Ue Ne Ee Se Co Oe The Education and Training of 


Teachers Toward World Understanding. 





5S. John Schmid, Jr. "Factor Analyses of Prospective Teachers' Differences," Journal of Experimental 


Education, XVIII (June 1950), pp. 287-319. 
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TABLE I 


TABULATION OF MAJORS AND MINORS FOR THE 64 TEACHERS 





University Majors 


Subject Area Male 


Female Total 


University Minors 
Female Total 


Male 


Male 


State Minors 


Female Total 





English 5 
History 6 
American Instit. 3 
Agriculture 1 
Physical Educ. 11 


Home Economics 
Business Educ. 
Physics 
Chemistry 
Natural Science 


| me me me | 


French 

Spanish 

Speech 

Speech Correction 
Economics 


ole] | 


Art Education 
Sociology 

Music 
Recreation 
General Science 


| | im ms co 


Political Science 
Mathematics 
Botany 

Biology 

German 


Geography 
Zoology 
Italian 


Pet de [meer wl low || lro 


13 
7 
3 
1 

11 
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Total 37 


67 


22 


52 


22 


47 
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American Council on Education Psychological 
Examination is 60.3. Bach® in a study involv- 
ing seventy-six 1950 graduates obtained a mean 
of 55.8 while Lamke! reported a mean ACE per- 
centile score of 72 for thirty-two 1950 graduates. 

The mean percentile rank in high school 
class is 75.7. 

It is thought that this group is reasonably rep- 
resentative of recent School of Education grad- 
uates at the University of Wisconsin. 


The Data-Gathering Devices Employed 





Forty-two variables were used in the analysis 
which follows. Nine of the variables are esti- 
mates of teaching success; seven were derived 
from the Thurstone Temperament Schedule; six- 
teen from the 16 P. F. Test; and the remaining 
ten from University records. 

The forty-two variables are listedbelow. The 
number preceding each measure is used to ident- 
ify that variable in subsequent correlation and 
factor matrices. A code symbol follows each 
measure, 


1. Acceptability rating by principal PA 
2. Principal’s M-blank rating (lstyear) PM1 
3. Supervisor I M-blank rating S1M 
4. Supervisor II M-blank rating S2M 
5. Principal’s M-blank rating (2nd year) PM2 
6. Teacher rating by an outside agency 
on a special rating form SPR 
7. Teacher’s self evaluation SE 
8. Peer evaluation PE 
9. Pupil evaluation PuE 
10. Active (Thurstone) A-T 
11. Vigorous (Thurstone) V-T 
12. Impulsive (Thurstone) I-T 
13. Dominant (Thurstone) D-T 
14. Stable (E for emotionally stable) 
(Thurstone) E-T 
15. Sociable (Thurstone) S-T 
16. Reflective (Thurstone) -T 


R 
17. Cyclothymia vs. Schizothymia 
(16 P. F. Test) A 
18. General Intelligence vs. Mental 
Defect (16 P. F. Test) B 
19. Emotional Stability or Ego Strength 
vs. General Neuroticism 


(16 P. F. Test) Cc 
20. Dominance or Ascendance vs. 
Submission (16 P.F. Test) E 
21. Surgency vs. Desurgency 
(16 P. F. Test) F 
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22. Positive Character vs. Immature 

Dependent Char. (16 P.F. Test) G 
23. Adventurous Cyclothymia vs. In- 

herent Withdrawn Schizothymia 

(16 P.F. Test) H 

24. Emotional Sensitivity vs. Tough 

Maturity (16 P.F. Test) I 
25. Paranoid Schizothymia vs. Trust- 

ful Accessibility (16 P.F. Test) L 
26. Bohemianism vs. Practical Con- 

cernedness (16 P. F. Test) M 
27. Sophistication vs. Rough Simplic- 

ity (16 P. F. Test) N 
28. Worrying Suspiciousness vs. Calm 

Trustfulness (16 P. F. Test) oO 
29. Radicalism vs. Conservatism 

(16 P.F. Test) Qj 

30. Independent Self-Sufficiency vs. 

Lack of Resolution(16 P. F. Test) Q2 
31. Will Control and Character 

Stability (16 P. F. Test) Q3 
32. Nervous Tension (16 P.F. Test) Q4 
33. Practice Teaching grade in Educa- 

tion 75 PT75 
34. Practice Teaching grade in Educa- 


tion Methods Course PTEM 
35. Grade in Education 73 G73 
36. Grade in Education 74 G74 
37. Lecture Section Grade, Education 

75 : LG75 
38. Percentile Rank in High School 

Class PRHS 
39. American Council on Education Psy- 

chological Examination Percentile 

Rank ACEPR 
40. Freshman-Sophomore Grade Point 

Average Fr. -So. GPA 
41. Junior-Senior Grade Point Average 

Jr.-Sr. GPA 


42. Composite: 2 credit major Educa- 
tional Methods lecture section grade 
and 2 credit minor methods grade EM4 


Sources of Data 





The following data were obtained from the 
beginning teacher visitation folders: 


1. Principal’s Acceptability rating. 

2. Principal’s rating on the Wisconsin Adap- 
tation of the M-Blank. 

3,4. Two visiting supervisor’s ratings on 
the M-Blank. 

The following estimates of teaching success 





6. Jacob 0. Bach. Practice Teaching Success in Relation to Other Measures of Teaching Ability, Ph.D. 





Dissertation, University of Wisconsin, 1951. 


7~ Tom A. Lamke. "Personality and Teaching Success," Journal of Experimental Education, XX (December 


1951), ppe 217-259. 
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were obtained during the 1951-1952 school year: 

5. Principal’s evaluation using the Wisconsin 
Adaptation of the M-Blank. 

6. Rating of the teacher by an outside agency 
on a special rating form. 

7. Self-evaluation. 

8. Peer evaluation. 

9. Pupil evaluation. 


Data for assessing temperament traits were 
obtained by administering the Thurstone Temper- 
ament Schedule to each subject. (Variables 10 
through 16. ) 

Data for assessing the sixteen source traits 
of personality were obtained by administering 
the 16 Personality Factor Test to each teacher. 
(Variables 17 through 32. ) 

The following data were obtained from Uni- 
versity records: 


33. Practice Teaching grade in Education 75. 

34. Practice Teaching grade in Education 
Methods Course. 

35, 36, and 37. Grades in Education 73, Ed- 
ucation 74, and Lecture Section grade in Educa- 
tion 75. 

38. Percentile rank in high school class. 

39. American Council on Education Psycho- 
logical Examination Percentile Rank. 

40. Freshman-Sophomore Grade Point Av- 
erage. 

41. Junior-Senior Grade Point Average. 

42. Lecture Section grade in major methods 
course and grade in 2 credit minor methods 
course, 


Description of the Measures Used 





1. Principal’s acceptability rating, PA— 
This criterion was Bp | defined by 
Lamke®, Ringness9, and Bach!9 to be that qual- 
ity of the teacher which leads other people to feel 
that she should be retained in her present posi- 
tion or possibly promoted to a more responsible 
assignment. In the fall of 1950, teams of visit- 
ing supervisors from the University of Wiscon- 
sin visited each beginning teacher. One grad- 
uate student from a core group of six graduate 
students was usually included in each team of 
two supervisors which visited each teacher 
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during the same class period. This graduate 
student conducted the interview with the prin- 
cipal to obtain the acceptability rating. The fol- 
lowing typical questions were asked at the end 
of the interview: 


1. How do you like this teacher? 

2. Does she get along well with the students ? 

3. Do the parents feel the same way as the 
pupils do about the teacher ? 

4. Is the teacher happy here? Does she 
have friends on the faculty, in the community, 
etc? 

5. What strengths or weaknesses does this 
teacher have? 

6. In terms of beginning teachers with whom 
you have worked in the past and in light of our 
discussion as a whole, would you consider this 
teacher as (1) among the best, (2) above aver- 
age, (3) average, (4) below average, or (5) fail- 
ing as a beginning teacher? 


The principal was thus led to consider some 
of the teacher’s more important relationships 
before he was asked for his acceptability rat- 
ing which is his response to question (6). A 
low score indicates a high rating. 

2. Principal’s M-Blank rating (lst year), 
PM1—A two-page form of the Wisconsin Adap- . 
tation of the M-Blank was used during the be- 
ginning teacher visitation program (see Appen- 
dix B)*. The principal was asked for his rat- 
ing of each teacher using this form. The ‘‘Gen- 
eral Evaluation’’ rating was the score used in 
this study. The categories on a five-point 
scale were as follows: (1) outstanding, (2) above 
average, (3) average, (4) below average, and 
(5) poor. A low score indicates a high rating. 

3. Supervisor I M-Blank rating, S1M, and 

4. Supervisor II M-Blank rating, S3M— 

As indicated previously, one graduate student 
of the core group was usually included on each 
visitation team. The rating assigned by this 
student was classified as Supervisor I rating 
and the rating of the other visiting supervisor, 
Supervisor II. In several cases, two students 
of this core group formed the team. In these 
instances the Supervisor I and Supervisor II 
ratings were classified alphabetically. In the 
case of each of these ratings a low score indi- 














8. Ibid. 





9. Tom A. Ringness. Relationships Between Certain Attitudes Towards Teaching and Teaching Success, 





Ph.D. Dissertation, University of Wisconsin, 1951. 


10. Ope cite 


* All references to Appendices may be found in original thesis on file in Library, University of 


Wisconsin. 
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cates a high rating. 

5. Principal’s M-Blank rating (2nd year), 
PM2—A form letter and a one-page form of the 
Wisconsin Adaptation of the M-Blank (see Ap- 
pendix C) was sent to each principal in January 
of this year. The rating under the item ‘‘Gen- 
eral Evaluation’’ was the score used in this 
study. A low score indicates a high rating. 

5. Teacher rating by an outside agency ona 
special rating form, SPR—A low score indicates 
a high rating. 

7. Teacher’s self-evaluation, SE—Harold 
M. Anderson of the University oi Colorado adapt- 
ed a pupil evaluation form develped by Bryan 
for use by pupils, peers, and the teacher. The 
Teacher’s Self Evaluation form (see Appendix 
D) was sent to each teacher during the past 
school year. The overall evaluation (item 10) 
was the score used in this study. The categor- 
ies of item 10 were as follows: (1) best, (2) sec- 
ond, (3) middle, (4) fourth, and (5) lowest. A 
low score indicates a high rating. 

8. Peer evaluation, PE—Two fellow teachers 
were sent a form letter asking them to rate the 
teacher using the Peer Evaluation of Instruction 
Form (see Appendix E). An attempt was made 
to have each teacher rated by one male and one 
female colleague. Those asked to rate the tea- 
cher were chosen, using the Official Wisconsin 
School Directory, from the same subject area, 
insofar as possible, as the teacher to be rated. 
The response to item 9 with the following cate- 
gories, upper, second, middle, fourth, and low- 
est, was the rating used in this study. The stm 
of these two peer ratings was the peer evaluation 
score used. A low score indicates a high rating. 

9. Pupil evaluation, PuE—Each teacher’s 
principal was sent a form letter and five Pupil 
Reaction to Instruction forms (see Appendix F). 
He was asked to secure the reactions of five 
students chosen at random from one of the clas- 
ses taught by the teacher. Each student returned 
his completed form in a self-addressed envelope 
furnished by the Teacher Personnel Research 
Committee. The sum of these five ratings on 
the general category rating with (1) best fifth, 
(2) second fifth, (3) middle fifth, (4) fourth fifth, 
and (5) poorest fifth was the pupil evaluation rat- 
ing score used in this study. A low number in- 
dicates a high rating. 

Before passing on to the description of other 
data-gathering devices it might be well to discuss 
certain terms. 

















What is teaching success? 





Educators agree that teaching is an exacting 
and complex act. Certain components such as 
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the unique physical setting of the teaching situ- 
ation, and the presence of one or more persons 
in this setting are prerequisites to the teaching 
act. Each individual, including the teacher, 
brings to such a setting his own unique pattern 
of behavior. No two teaching situations are ex- 
actly alike due to the complex and intricate in- 
terplay of these unique components. By and 
large, a teacher in a given situation will be 
judged successful or unsuccessful, effective or 
ineffective, in terms of how he adjusts his own 
unique pattern of behavior to the unique physical 
setting and to the unique behavioral patterns of 
those with whom he has contacted. 

The terms ‘‘successful’’ and ‘‘effective’’ as 
applied to teachers need not be and probably are 
not synomous. If by an effective teacher we 
mean one who provides an atmosphere which 
will foster desirable growth in pupils, then 
agreement would be needed on what desirable 
pupil changes we wish to measure and deter- 
mine means of assessing these. Barr!! in 19- 
35 stated: ‘‘....the ultimate criterion ofteach- 
ing success will have to be found in the changes 
produced in pupils measured in terms of the ob- 
jectives of education.... These changes or pro- 
ducts of instruction will have to be considered 
broadly. We shall have to take into considera- 
tion not merely knowledges and skills and the 
more tangible outcomes of instruction but atti- 
tudes and ideals and the less tangible outcomes 
of instruction. ’’ If this could be done the degree 
of effectiveness of a teacher could be determ- 
ined according to prescribed standards. Up to 
the present time this has not beenadequately ac- 
complished for any grade level or for any sub- 
ject matter area. Effectiveness so determined 
could be considered either as an index of suc - 
cess or as an index of effectiveness for that 
teacher. 

Success is a broader term that not only in- 
volves the teacher as a director of learning but 
also involves the teacher as a friend and coun- 
selor of pupils, as a member of a professional 
staff, and as a citizen ina community. To be 
successful in this broader sense, a teacher not 
only must produce desirable changes in pupils 
but he must be acceptable to the administration, 
to his peers, to the pupils, to the parents, and 
to the community. Many people will feel that to 
be effective (as defined in this study), a teacher 
must possess these broader competencies. 

The evaluations and judgments made about 
teachers by principals and supervisors continue 
to be important in education. Very oftenthese 
evaluations are given in the form of ratings 
such as those used in this study. On the dasis 
of such ratings teachers often are promoted, 
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are given pay increases, and are granted licen- 
ses and certificates. Sucha rating is therater’s 
opinion of the teacher expressed as a value judg- 
ment. The question of reliability does not arise 
if we view this rating as the rater’s judgment of 
this teacher’s effectiveness relative to the time 
and to the specific situation in which she was 
observed. The question of reliability may arise, 
however, if at some later date the same rater 

is asked to rate this same teacher again using 
the same rating instrument. If one asks the 
question: ‘‘Dothetworatings agree?’’ a question 
of reliability clearly arises. If the two ratings 
do not correlate highly we cannot be sure wheth- 
er this lack of agreement is due to a change in 
the rater, a change in the ratee, or a change in 
the interpretation or application of the instru- 
ment. 

Each of the ratings used in this study were 
obtained as judgments of the principal, a visit- 
ing supervisor, the teacher’s peers, and the 
pupils taught by the teacher concerning the ef- 
fectiveness of a particular teacher at a particu- 
lar time ina particular situation. The specific 
instruments utilized to obtain these judgments 
have been described in detail earlier in this sec- 
tion. In this study we will study the interrela- 
tionships found to exist between these several 
rating judgments to determine if these judgments 
are measuring different factors. Described 
next are certain measures of temperament, 

i0. through i6. Thurstone Temperament 
Schedule, TTS—This questionnaire was devel- 
oped to show types of temperament. Seven ar- 
eas of temperament are appraised by questions 
pertaining to likes and dislikes, preferences and 
habits, in everyday life. The schedule contains 
140 such questions. The twenty items scored 
for each area are scattered throughout the sched- 
ule. The testee answers each question with Yes, 
No, or ?. However, only the Yes or No re- 
sponses score on a self-scoring answer pad ac- 
companying the booklet. A copy of the Thurstone 
Temperament Schedule and the mimeographed 
letter of instructions sent to each teacher are in- 
cluded in Appendix G. 

Test-retest reliabilities for these seven 
temperament areas were computed for 81 male 
executives. The retests were given within six 
months of the first administration. The relia- 
bilities were: A, 0.78; V, 0.78; I, 0.79; D, 
0.82; E, 0.61;S, 0.73; and R, 0. 75. 

Split-half reliabilities corrected by the Spear- 

man Brown formula for 200 college men were: 
A, 0.48; V, 0.61;1, 0.65; D, 0.77; E, 0.63; 
S, 0,68; R, 0.73. Split-half reliabilities com- 
puted for 157 college women in a similar man- 
ner were: A, 0.46; V, 0.63; 1, 0.65; D, 0.77; 
E, 0.64;S, 0.73; and R, 0. 62. 

The seven areas of temperament assessed 
by this schedule are described below. These 
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descriptions are quoted from the Examiner Man- 
ual for the Thurstone Temperament Schedule. 


10. Active, A-T 

A person scoring high in this area usually 
works and moves rapidly. He is restless when- 
ever he has to be quiet. He likes to be ‘‘on the 
go’’ and tends to hurry. He usually speaks, 
walks, writes, drives, and works rapidly, even 
when these activities do not demand speed. 


11. Vigorous, V-T 

A person witk a high score in this area par- 
ticipates in physical sports, work requiring the 
use of his hands and the use of tools, and out- 
door occupations. The area emphasizes phys- 
ical activity using large muscle groups and 
great expenditure of energy. This traitis often 
described as ‘‘masculine’’, but many women 
and girls will score high in this area. 


i2. Impulsive, I-T 

High scores in this category indicate a happy- 
go-lucky, daredevil, carefree, acting-on-the - 
spur-of-the-moment disposition. The person 
makes decisions quickly, enjoys competition, 
and changes easily from one task to another. 
The decision to act or change is quick regard- 
less of whether the person moves slowly or rap- 
idly (Active), or enjoys or dislikes strenuous 
projects (Vigorous). A person who doggedly 
‘*hangs on’’ when acting or thinking is typically 
low in this area. 


13. Dominant, D-T 

People scoring high on this factor think of 
themselves as leaders, capable of taking initi- 
ative and responsibility. They are not domin- 
eering, even though they have leadership abil- 
ity. They enjoy public speaking, organizing 
social activities, promoting new projects, and 
persuading others. They are the ones who would 
probably take charge of the situation in case of 
an accident. 


14, Stable (E for emotionally stable), E-T 
Persons who have high scores usually are 
cheerful and have an even disposition. They can 
relax ina noisy room, and they remain calm in 

acrisis. They claim that they can disregard 
distractions while studying. They are not irri- 
tated if interrupted when concentrating, andthey 
do not fret about daily chores. They are not an- 
noyed by leaving a task unfinished or by having 
to finish it by a deadline. 


15. Sociable, S-T 

Persons with high scores in this area enjoy 
the company of others, make friends easily, and 
are sympathetic, cooperative, and agreeable in 
their relations with people. Strangers readily 
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tell them about personal troubles. 


16. Reflective, R-T 

High scores in this area indicate that a per- 
son likes meditative and reflective thinking and 
enjoys dealing with theoretical rather than prac- 
tical problems. Self-examination is character- 
istic of reflective persons. These people are 
usually quiet, work alone, and enjoy work that 
requires accuracy and fine detail. They often 
take on more than they canfinish, andthey would 
rather plana job than carry it out. 


The following consideration regarding the 
use of the term temperament was thought im- 
portant for this study. 

Allportl2 speaks of temperament as being a 
certain class of raw material from which per- 
sonality is fashioned. ‘‘Strictly speaking there 
is no temperament apart from personality, nor 
any personality devoid of temperament.’’ He 
makes the following definition of temperament:i3 
‘‘Temperament refers to the characteristic phen- 
Omena of an individual’s emotional nature, in- 
cluding his susceptibility to emotional stimula- 
tions, his customary strength and speed of re- 
sponse, the quality of his prevailing mood, and 
all peculiarities of fluctuation and intensity in 
mood; these phenomena being regarded as de- 
pendent upon constitutional make-up, and there- 
fore largely hereditary in origin. ’’ Those areas 
of temperament assessed by the Thurstone Tem- 
perament Schedule will be considered in this 
study. Seven areas of temperament are apprais- 
ed by this questionnaire. Thurstone/4 states 
that his schedule was ‘‘....designed to assess 
those traits which are relatively permanent for 
each person.... It is limited to a practical de- 
scription of important aspects of temperament.’’ 


17. through 32. 16 P. F. Test (Cattell), 16 PF 
This self report questionnaire was construct- 
ed to assess the sixteen source traits which Cat- 
tell has postulated account for the fundamental 
structure of personality. The validity of the test 
is logical. The test consists of two forms, A 
and B. Each form contains 187 questions each 
of which may be answered by possible responses 
of Yes, ?, No. Cattell recommends the forced 
choice response with the elimination of ‘‘?’’ un- 
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der certain conditions. This recommendation 
was followed in this study. The questions which 
score for each factor are scattered throughout 
the test. A response toa givenquestioncontrib- 
utes to the score of only one factor. 

The split-half reliability of the test ona 
sample of 200 of the general population, correct- 
ed to the full number of items in the A and B 
forms, is reported in the test manual to be as 
follows for the sixteen factors: A, 0.84; B, 0.70; 
C, 0.71; E, 0.82; F, 0.85; G, 0.56; H, 0. 74; 

I, 0.54; L, 0.55; M, 0.72; N 0.65; O, 0.88; Q;, 
0.50; Qg, 0.61; Q3, 0.53; Q4, 0. 76. 

Both forms were administered to each sub- 
ject. A copy of the 16 P. F. Test and the mim- 
eographed letter of instructions sent to each 
teacher is included in Appendix H. 

The sixteen source traits of personality as- 
sessed by the test are described below primar- 
ily in terms of questionnaire responses. The 
plus pole corresponds to the category listed on 
the left and is denoted by a high numerical score 
for that factor. For more detailed descriptions 
of these factors the reader is referredto many 
of Cattell’s written reports (see Bibliography) 
or Guide to Principles and Techniques of Per- 
sonality and Ability Testing, and Handbook for 
the Sixteen Personality Factor Questionnaire. 
The following partial descriptions of these fac- 
tors are quoted from the latter publication: 


17. Factor A. Cyclothymiavs. Schizothymia, A 
A score of 40 is possible for this factor. 
This factor has been found to load most highly 

the following traits: 
Good natured, easy going vs. spiteful, grasp- 
ing, critical 
Ready to cooperate vs. obstructive 
Attentive to people vs. cool, aloof 
Soft hearted, kindly vs. hard 
Trustful vs. suspicious 
Adaptable vs. rigid 
Warm hearted vs. cold 





....In questionnaire responses the A+ indi- 
vidual expresses marked preference for occu- 
pations dealing with people and socially impres- 
sive situations while the A- person likes things 
or words (particularly material things), work- 
ing alone, intellectual companionship, andavoid- 
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ance of clash of viewpoints. 


18. Factor B. General Intelligence vs. Mental 
Defect, B 

A score of 26 is possible for this factor. The 
measurement of intelligence carries with it as 
a factor in the personality realm some of the 
following ratings: 

Conscientious vs. somewhat unscrupulous 

Persevering vs. quitting 

Intellectual, cultured vs. boorish 





....and indicating some moderate tendency 
for the more intelligent person to have some- 
what more of the traits which we often call char- 
acter traits. 


19. Factor C. Emotional Stability or Ego 
Strength vs. General Neuroticism, C 
A score of 52 is the highest possible score 
for this factor. This factor loads: 
Emotionally mature vs. lacking in frustra- 
tion tolerance 
Emotionally stable vs. changeable 
Calm, phlegmatic vs. showing general emo- 
tionality 
Realistic about life vs. evasive 
Absence of neurotic fatigue vs. neurotically 
fatigued 
Placid vs. worrying 








This factor is one of good versus poor dyn- 
amic integration of the personality.... In the 
questionnaire manifestation the C- person is 
easily annoyed by things and people, is dissat- 
isfied with the world situation, his family, the 
restrictions of life and his own health. 


20. Factor E. Dominance or Ascendance vs. 
submission, E. 

A score of 52 is possible for this factor. 
Assertive, self-assured vs. submissive 
Independent minded vs. dependent 
Hard, stern vs. kindly, soft-hearted 
Solemn vs. expressive 
Unconventional vs. conventional 
Tough vs. easily upset 
Attention getting vs. self-sufficient 





....At present no good objective tests exist 
for this factor, since it seems to show itself 
more in human relations than in assertion over 
inanimate difficulties. The chief questionnaire 
responses indicate a tendency to assert oneself 
socially, to believe that one is right in disagree- 
ments, to borrow or trespass without embar- 
rassment, and to be ready to take a chance in 
dangerous situations. 


21. Factor F. Surgency vs. Desurgency, F 
A score of 52 is possible for this factor. 
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This factor loads: 
Talkative vs. silent, introspective 
Cheertul vs. depressed 
Placid vs. anxious 
Frank, expressive vs. incommunicative, 
smug 
Quick, alert vs. languid, slow 


.... The questionnaire responses show F+ 
to be carefree, happy-go-lucky, fond of bustle 
and excitement, inclined to practical jokes and 
disinclined to occupations requiring close and 
accurate work. There is a significant trend 
from F+ behavior with age. 


22. Factor G. Positive Character vs. Imma- 
ture Dependent Character, G 
A score of 40 is the highest possible score 
for this factor. 
Persevering, determined vs. quitting, fickle 
Responsible vs. frivolous 
Emotionally mature vs. demanding, impa- 
tient 
Consistently ordered vs. relaxed, indolent 
Conscientious vs. undependable 
Attentive to people vs. obstructive 








....Subjectively the G+ person views him - 
self as correct in, and a guardian of manners 
and morals, persevering, planful, able to con- 
centrate, interested in analyzing people, cau- 
tious and preferring efficient people to other 
companions. 


23. Factor H. Adventurous Cyclothymia vs. 
Inherent Withdrawn Schizothymia, H 
A score of 52 is possible for this factor. 
This factor loads: 
Gregarious sociability vs. shyness, with- 
drawing tendency 
Adventerous, bold vs. cautious, retiring 
Having marked interest in the opposite sex 
vs. slight interest in the opposite sex 
Frivolous vs. conscientious 
Strong, artistic, or sentimental interests 
vs. lack of same 
Abundant emotional response vs. coolness, 
aloofness 








.... The H- individual reports himself to be 
intensely shy, convinced of his inferiority, slow 
and impeded in expressing himself, disliking 
occupations with personal contacts, preferring 
one or two close friends to large groups and 
not able to keep in contact with all that is go- 
ing on around him. 


24. Factor I. Emotional Sensitivity vs. Tough 
Maturity, I 
A score of 40 is possible for this factor. 
This factor loads: 
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Demanding, impatient vs. emotionally ma- 
ture 

Dependent, immature vs. independent mind- 
ed 

Imaginative, introspective vs. set and smug 

Kindly, gentle vs. hard, cynical 

Aesthetically fastidious vs. lacking artistic 
feeling 

Frivolous vs. responsible 

Attention getting vs. self-sufficient 


....In the questionnaire realm, such a fac- 
tor has appeared several times and has been 
called self-sufficiency, but as there are twoself- 
sufficiency factors, this needs special descrip- 
tion, particularly since attention-getting appears 
in the alleged self-sufficiency. The individual 
typically shows a fastidious aversion to ‘‘crude’’ 
people and occupations but a liking for travel 
and new experiences along with a labile, imag- 
inative, aesthetic mind and a certain imprac- 
ticality in general affairs. 


25. Factor L. Paranoid Schizothymia vs. Trust- 
ful Accessibility, L 
A score of 40 is possible for this factor. 
This factor loads: 
Prone to jealousy vs. free of jealous tenden- 
cies 
Placid, shy, bashful vs. composed 
Suspicious vs. trustful 
Dour vs. cheerful 
Rigid vs. adaptable 
Hard and unconcerned vs. concerned about 
other people 








....Objectively the individual reports that 
he-is annoyed by people claiming to be superior 
to others and that he considers himself as a per- 
son who is scrupulously correct in his behavior. 
He also shows some intensive interests in intern- 
al, mental life. 


26. Factor M. Bohemianism vs. Practical Con- 
cernedness, M 
A score of 52 is possible for this factor. 
This factor loads: 
Unconventional, eccentric vs. conventional 
Sensitively imaginative vs. practical, logical 
Undependable vs. conscientious 
Placid exterior vs. easily concerned and ex- 
pressive 
Occasional hysterical emotion vs. given to 
keeping head in emergencies 





.... There are some evidences in the ques- 
tionnaire of the conversion hysteria type of in- 
stability at the ‘‘Bohemian’’ pole of this factor. 
For example, the person walks and talks in his 
sleep, makes demands on others with placidity, 
gets emotional but does not worry, is oblivious 
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to convention but can feel childishly dependent 
and insecure episodically. 


27. Factor N. Sophistication vs. Rough Sim- 
plicity, N 
A score of 40 is possible for this factor. 
This factor loads: 
Polished vs. clumsy, awkward 
Cool, aloof vs. attentive to people 
Fastidious vs. easily pleased 





....In terms of the questionnaires, it has 
been discovered two or three times under vary- 
ing titles such as intellectual leadership, hard 
headed rationalism, and titles indicating a so- 
phisticated, intellectual, unsentimental ap- 
proach to things. 


28. Factor O. Worrying Suspiciousness vs. 

Calm Trustfulness, O 

A score of 52 is possible for this factor. 
This factor is not very well defined in ratings, 
but loads: 

Worrying, anxious vs. placid, tough 

Suspicious, brooding vs. trustful, free from 

Suspiciousness 








In various questionnaires it has been very 
well defined and variously labelled as ‘‘Depres- 
sive Tendency, ’’ ‘‘Moodiness, ’’ ‘‘Emotional 
Sensitivity, ’’ ‘‘Self Depreciation, ’’ and ‘‘Gen- 
eral Neuroticism.’’ It is, however, quite dis- 
tinct from the general neuroticism factor lab- 
elled C above, and it may be that it is some 
relatively temporary condition of depression 
and worrying which is not a stable feature of 
the personality. At any rate, it is somewhat 
decidedly more obvious in the mental interior 
of the subject than it is to the observer, caus- 
ing him to feel remorseful, downhearted, sub- 
ject to phobias and neurasthenic symptoms, 
worrying, avoiding people and perturbed by 
the inconstancy of his own moods. 

29. Factor Qj. Radicalism vs. Conservatism, 
Q1 

A score of 40 is possible for this factor. 
This factor has not appeared in personality rat- 
ings, and appears most clearly in attitudes. 
Nevertheless, it does have some role inthe total 
personality, and the more radical show evidence 
of being more introspective, more interested 
in fundamental issues, and more interested in 
intellectual matters generally. There is also 
some evidence from more objective test situa- 
tions that Qj+ persons are more well informed, 
less inclined to moralize, and more inclined 
to experiment in life generally. The views of 
the more radical or more conservative ona set 
of actual attitudes are already clearly indicat- 
ed in the questionnaire itself. This factor and 
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the remaining three factors are not given alpha- 
betical labels because they have not yet been 
demonstrated to appear in the series of factors 
obtainable in ratings. 


30. Factor Qg. Independent Self-Sufficiency vs. 
Lack of Resolution, Q2 

A score of 40 is possible for this factor. As 
will be evident from the questionnaire responses 
themselves, this factor in its positive loadings 
indicates an individual who is resolute and ac- 
customed to going his own way, but who is not 
necessarily dominant in his relation to other 
people. The Q9- person prefers to work and 
make decisions in company with other people, 
likes social approval and admiration, is conven- 
tional and fashionable. The predictive value 
will be evident from the occupational associa- 
tions. 








31. Factor Qj. Will Control and Character Sta- 
bility, Q3 

A score of 40 is possible for this factor. 
This factor has some relation to the C andG 
factors described above, yet it is not very evi- 
dent in ratings and is still listed and defined 
primarily as a factor in the questionnaire re- 
sponses for mental inferiors. Individuals high 
in this factor show, according to the question- 
naire responses, strong control of emotions and 
behavior. They are inclined to be considerate, 
careful, and conscientious, but also obstinate. 
Although no behavior ratings have picked out the 
factor, it has been observed that when people 
who are high in this factor have been put together 
in groups, the general efficiency and objective- 
ness of the group is decidedly higher than in 
groups that are only average in the factor. There 
are indications that persons high in this factor 
are inclined to mathematical interests. 





32. Factor Q4. Nervous Tension, Q4 

A score of 52 is possible for this factor. 
This factor has been found in earlier question- 
naire studies where it has been called nervous- 
ness, nervous anxiety and instability, and sleep 
difficulties with somaesthenia. The general pic- 
ture is that of a person who is tense, fatigued 
but is unable to remain inactive. The dimension 
resembles a common description of what disting- 
uished the hypertensive person. The ratings 
show some correlation between this factor and 
strength of interest in the opposite sex, and it 
is possible that it is to be explained as a sex 








ERICKSON 


Suppression factor. Until the research now 

in progress is completed, however, the predic- 
tions from this factor must rest on the known 
character of the responses, and on whatever as- 
sociations with occupational norms are shown 
below. 


Personality of the teachers used in this 
study will be assessed by means of the Cattell 
Sixteen Personality Factcr Questionnaire. Cat- 
tell makes this statement regarding this ques- 
tionnaire; 15 ‘‘The questionnaire thus aims to 
leave out no important aspect of the total per- 
sonality, for the above factors are based on 
even sampling from the personality sphere and 
include abilities (intelligence), temperamental 
factors, and dynamic (character integration) 
source traits. ’’ Cattell16 makes the following 
distinctions between the three modalities of 
dynamic traits (or interests), abilities, and 
temperamental traits: ‘‘Dynamic traits are 
characterized by behavior arising from a stim- 
ulus situation or incentive and directed to some 
goal, at which the action ceases. Abilities, by 
contrast, are shown by how well the person 
makes his way to the accepted goals. Dynamic 
traits are thus traits in which performance var- 
ies as the incentive varies; whereas abilities 
can be recognized as those in which perform- 
ance varies in response to.changes in complex- 
ity. Temperament traits are definable by ex- 
clusion as those traits which are unaffected by 
incentive or complexity. These are traits like 
high-strungness, speed, energy, and emotional 
reactivity, which common observation suggests 
are largely constitutional. ’’ 

The first twelve primary source traits of 
personality described earlier in this section 
have been determined by previous research, 
primarily factor analysis, to be shown in be- 
havior ratings, questionnaire or ‘‘self-rating”’ 
items, and in objective tests. The other four 
primary personality factors have thus far been 
found only in questionnaire responses. 


33. Practice Teaching Grade in Education 75, 
PT75 , 

This grade is available ona special card 
filed in the education office at Wisconsin 
High School. Since such letter grades usually 
carry plus and minus signs the following 
transformation scheme was used: A= 1; A- = 
2; B+ =3; B=4; C-=8. A lowscore indi- 
cated a high grade. 








15. Re Be Cattell and others. Handbook for the Sixteen Personality Factor Questionnaire (Champaign, 





Illinois: Institute for Personality ami Ability Testing), p. 1. 


16. Raymond B. Cattell. Personality (New York: McGraw-Hill Book Cos, 1950), p. 35 
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34, Practice Teaching Grade in Education Meth- 
ods Course, PTEM 
This grade is also filed at Wisconsin High 
School. The letter grades were converted into 
numerical scores using the same procedure as 
that outlined in 33 above. A low score indicates 
a high grade. 





35. Grade in Education 73, G73, and 
36. Grade in Education 74, G74 

These grades were obtained from the under- 
graduate transcript of credits of these teachers. 
The letter grades were converted to numerical 
scores by using the following system: A = 1; 
B=2;C =3;andD=4. A low score indicates 
a high grade. 








37. Lecture Section Grade, Education 75, LG75 
The 3-credit lecture section grade is com- 
bined with the 2-credit practice teaching grade 
to form the 5-credit Education 75 grade found 
on the University records. The 3-credit grade 
was determined by consultation with the individ- 
ual instructors to obtain the best estimate which 
when weighted three times with twice weighted 
practice teaching would yield the 5-credit grade. 
A conversion system similar to that used for var- 
iables 35 and 36 was used to convert to numer- 
ical scores. A low score indicates a highgrade. 





38. Percentile Rank in High School Class,PRHS 

The rank of each student in his high school 
graduating class was divided by the number of 
students in that class. This quotient was sub- 
tracted from 100 to yield the percentile rank in 
high school class used in this study. A high 
score indicates a high relative rank inhigh school 
graduating class. 





39. American Council on Education Psycholog- 
ical Examination for College Freshmen Per- 
centile Rank, ACEPR 

The total score percentile rank was the meas- 
ure of intelligence used in this study. Those 
teachers who entered the University of Wiscon- 

Sin in the fall of 1946 were administered 1945 

editions of this examination during Orientation 

Week, 1946. The majority of scores used in 

this study are from the 1945 edition of this test. 

National norms were used to determine the per- 

centile ranks. The same norms were used for 

the several groups. A high score indicates a 

high level of intelligence. 








40. Freshman-Sophomore Grade Point Average, 
Fr. -So. GPA 
The following system was employed in the 
computation of this grade pointaverage. Astud- 
ent may register as a junior in the School of Ed- 
ucation at the University of Wisconsin if he has 
earned a minimum of 58 credits and an equiv- 
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alent number of grade points. The majority of 
the students will register for their fifth semes- 
ter as juniors. In such cases the number of 
grade points was divided by the number of cred- 
it hours to obtain the grade point average. Sev- 
eral cases did not have 58 credits after four 
semesters at the University. In these instances 
the grade point average for the fifth semester 
was computed. A sufficient number of credits 
and grade points of fifth semester were added 
to yield the minimum of 5&8 Freshman-Sopho- 
more credits. The fifteen elective war service 
credits were not included for grade point ratio 
computations. They were, however, added to 
the credits a student had earned to change the 
student’s University classification. If, for ex- 
ample, a student had earned 28 credits as a 
freshman and during the third semester he was 
granted these 15 war elective credits in addi- 
tion to 15 credits of regular work, he would be 
permitted to register for his fourth semester 

of University study as a junior. But the fresh- 
man-sophomore grade point average would be 
computed only for the 43 credits he had earned 
at the University in regular class work. Trans- 
fer credits from other collegiate institutions 
were not included in the computation of GPA’s. 
A high score indicates a high grade point aver- 
age. 


41. Junior-Senior Grade Point Average, Jr. - 
Sr. GPA 

The credits and grade points, exclusive of 
war elective service credits, not included in 
the computation of the Freshman-Sophomore 
Grade Point Average were used to compute the 
Junior-Senior Grade Point Average. A high 
score indicates a high grade point average. 





42. Composite: 2 Credit Major Educational 
Methods Lecture Section Grade and the 2 
Credit Minor Methods Grade, EM4 

The two credit major methods lecture sec- 
tion grade was obtained from the individual in- 
structors in a manner similar to that outlined 
for the determination of the lecture grade in Ed- 
ucation 75. The 2 credit minor methods grade 
was obtained from the University transcript. 

Each of these grades were converted to numer- 

ical scores with A=1;B=2;C =3;D=4. The 

sum of these two numerical scores was the 
score used for this composite measure. A low 
score indicates a high composite grade. 











Procedure to be Followed in Treatment of Data 





Success in teaching is defined in terms of 
the following ratings: 

1. Principal’s acceptability rating 

2. Principal’s M-blank rating (1st year) 

3,4. Two visiting supervisor’s M~-blank rat- 
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ings. 

5. Principal’s M-blank rating (2nd year). 

6. A rating obtained from an outside agency 
on a special form. 

7. Teacher’s self evaluation. 

8. Peer evaluation. 

9. Pupil evaluation. 


Four types of data were collected for each 
subject. They were: a) in-service estimates of 
success; b) measuros of temperament; c) meas- 
ures of personality; and d) pre-service meas- 
ures of achievement. This investigation is pri- 
marily concerned with the relationships between 
a) and b), a) and c), and a) and d). 

This study will follow the following proced- 
ure: 

1. The coefficients of correlationfor each 
pair of in-service estimates of success variables 
will be computed. 

2. The resultant nine by nine matrix of cor- 
relations will be factored by the multiple group 
method to determine factors underlying these 
estimates of teaching success. These hypothet- 
ical factors should provide us with a simpler ex- 
planation of the interrelationships of the teach- 
ing success estimates. 

3. The correlation coefficients of each tem- 
perament measure, each personality measure, 
and each achievement measure with each of the 
nine estimates of teaching success will be com- 
puted. These correlations will form the entrizs 
of three rectangular matrixes of orders seven 
by nine, sixteen by nine, and ten by nine respec- 
tively. 

4. The grouping of the variables anda pro- 
cedure similar to that utilized in (2) will be ap- 
plied to these three rectangular matrixes to de- 
termine if such an analysis will account for the 
correlations in these matrixes. 

5. Factor scores for the factors found in (2) 
will be calculated for each individual. These 
scores are estimates of the factors found by the 
analysis in (2). These factor scores willbe used 
as criteria of teaching success. 

6. Pearson product-moment coefficients of 
correlation will be calculated for each tempera- 
ment, personality, and achievement variable 
with these obtained criteria. 

7. Multiple regression techniques will be em- 
ployed to develop equations for the prediction of 
the criteria. 
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SECTION Iil 
ANALYSIS OF THE DATA 


IN THIS section the statistical treatment 
of the data will be described. The readers will 
be referred to tables and to selected references 
for more detailed discussions of the statistical 
methods employed in carrying out this study. 

The extreme scores, ranges, means, stand- 
ard deviations, and the number of teachers for 
whom the forty-two measures were obtainedare 
presented in Table II. 

The Pearson product moment coefficient of 
correlation will be used in this study to indicate 
linear relationships existing between various 
sets of scores. 

Factor analysis will be the major statistical 
device utilized in summarizing the correlation- 
al data. 

This section will be divided into six parts. 
Part one will deal with the factor analysis ofthe 
nine by nine matrix of intercorrelations of the 
nine estimates of teaching success. This nine 
by nine matrix has been designated the R matrix. 

In part two the analysis of the Rx matrix 
(seven by nine rectangular matrix) will be pre- 
sented. This matrix has as entries the correl- 
ations of the seven temperament variables, 
numbered in this study 10 through 16, with the 
nine estimates of teaching success. 

In part three the factor analysis of the rec- 
tangular matrix, of order sixteen by nine, des- 
ignated the Ry matrix, will be discussed. The 
entries in this matrix are the correlations of 
the sixteen personality source trait measures 
(variables 17 through 32) with the nine estimates 
of teaching success. 

Part four will be concerned with a discussion 
of the factor analysis of the Rz matrix. This 
matrix is of order ten by nine and hasas entries 
the correlations of ten preservice measures of 
achievement and intelligence (variables number- 
ed 33 through 42) with the nine in-service esti- 
mates of the factors found in the analysis re- 
ported in part one. These estimates will be in 
the form of factor scores and will be used as 
‘‘composite”’ criteria in the prediction phase of 
this report. 

Part six will discuss the results of the mul- 
tiple correlation techniques employed to predict 
the factor scores discussed in the preceding part. 
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TABLE 0 


EXTREMES, RANGES, MEANS, STANDARD DEVIATIONS, AND NUMBER OF CASES 
OF THE FORTY-TWO MEASURES 








High* Low 
Variable Code Score Score Range Mean Ss. D No. 
1 PA 5 1 4 2. 05 0. 80 64 
2 PM1 5 1 4 2.16 0. 87 61 
3 S1M 5 1 4 2.53 0. 84 62 
4 S2M 5 1 4 2.53 0. 95 58 
5 PM2 4 1 3 2.29 0. 63 63 
6 SpR 12 5 7 7.12 1. 86 60 
7 SE 3 1 2 2.18 0. 66 62 
8 PE 8 2 6 4.02 1. 45 56 
9 PuE 18 5 13 10.51 3. 02 53 
10 A-T 18 5 13 12.20 3.34 64 
11 V-T 18 2 16 10. 83 4.38 64 
12 I-T 19 5 14 11. 75 3.26 64 
13 D-T 20 4 16 12.52 4.45 64 
14 E-T 19 6 13 12.13 3. 98 64 
15 S-T 19 5 14 13.58 3.14 64 
16 R-T 19 1 18 8. 22 3. 08 64 
17 A 34 8 26 22. 5€ 5. 46 64 
18 B 23 14 9 19. 69 2. 08 64 
19 Cc 48 26 22 36. 84 5. 00 64 
20 E 36 6 30 21. 86 7.01 64 
21 F 50 8 42 29.78 8. 38 64 
22 G 36 16 20 25. 89 4. 66 64 
23 H 50 16 34 37. 95 8.51 64 
24 I 30 4 26 17. 16 4. 85 64 
25 L 28 2 26 15. 81 5. 81 64 
26 M 30 8 22 19. 16 5. 40 64 
27 N 34 12 22 22.95 3.90 64 
28 oO 30 0 30 10. 47 8. 30 64 
29 Q, 30 8 22 19.31 5.58 64 
30 Q2 31 8 23 17.38 5. 98 64 
31 Q; 38 10 28 26.55 5. 64 64 
32 Q4 34 2 32 13. 33 7. 75 64 
33 PT75 7 1 6 2.89 1.74 64 
34 PTEM 8 1 7 2.61 1.78 64 
35 G73 3 1 2 1. 84 0.57 63 
36 G74 3 1 2 1.90 0. 53 63 
37 LG75 4 1 3 2.17 0. 78 64 
38 PRHS 99 15 84 75. 70 23.81 60 
39 ACEPR 100 8 92 60. 30 25.31 56 
40 Fr-SoGPA 2. 80 0.97 1. 83 1.78 0.51 58 
41 Jr-SrGPA 2. 88 1. 02 1. 86 1.99 0. 41 64 
42 EM4 12 5 7 3. 42 1. 00 64 





*High refers to numerical value only. 
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The Use of Factor Analysis 





Before proceeding to the more detailed dis- 
cussion of the findings of this study, the writer 
feels that severat brief statements concerning 
factor analysis should be made. 

Holzinger and Harman!7 make this statement 
about factor analysis: ‘‘Factor analysis is a 
branch of statistical theory concerned with the 
resolution of a set of descriptive variables in 
terms of a small number of categories or fac- 
tors. This resolution is accomplished by the 
analysis of the intercorrelations of the variables. 
A satisfactory solution will yield factors which 
convey all the essential information of the orig- 
inal set of variables. The chief aim is thus to 
attain scientific parsimony or economy of de - 
scription. ’’ 

This is the first basic problem with which 
factor analysis deals. The second basic prob- 
lem deals with the problem of describing the 
factors in terms of the original variables. 

There are many different methods which can 
be used to transform a correlation matrix intoa 
factor matrix. The multiple-group method of 
factoring the correlation matrix was chosen for 
use in this study. This method was first pre- 
sented by Holzinger18 and six months later by 
Thurstone. 19 This procedure is applicable to 
any correlation matrix which can be divided in- 
to sections that are linearly independent. 

Factor analysis in this study was used both 
to determine the factors common to the nine es- 
timates of teaching success and to determine 
whether these common factors underlie certain 
temperament, personality, and pre-service 
measures of achievement and intelligence. The 
results of these analyses will be reportedinparts 
one through four. The second basic problem of 
factor analysis involving factor estimates will be 
discussed in part five. 

The a oe method outlined by Harris 
and Schmid20 was closely followed in this study. 
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Part One: Factor Analysis of the Nine Inser- 








vice estimates of Teaching Success 





The writer doubted if these nine estimates 
were measures of the same factor. It was found 
that a general factor would not account for the 
intercorrelations of the R matrix supporting the 
original contention. It seemed desirable to ob- 
tain a simpler description of these measures. 
That an arithmetical average of the measures 
in the form of a composite might not be a satis- 
factory means of summarizing this data has 
been pointed out by Holzinger. 21 

Factor analysis of the nine in-service esti- 
mates of teaching success was designed to fol- 
low these steps: 

1. Calculation of the matrix of intercorrela- 
tions of the nine estimates of teaching success. 

2. Transformation of this correlation matrix 
into an arbitrary oblique factor matrix using 
the multiple group method of factoring. 

3. Rotation of this initial oblique solution to 
a meaningful primary pattern following the meth- 
od of direct rotation to primary factor structure 
outlined by Harris. 22 

It became evident after the completion of the 
first two steps that the third step would not be 
necessary. 

Pearson product-moment coefficients of cor- 
relation were computed for each pair of the first 
nine variables. The intercorrelations and the 
final communality estimates of these variables 
are presented in Table Il. This matrix has 
been labeled the R-matrix for subsequent refer- 
ences. 

After several attempts at grouping the vari- 
ables and using Harris’ 23 geometrical criter- 
ion of the dimensionality of the common factor 
space, R was finally sectioned into three groups 
as follows: 





17. Karl J. Holzinger and Harry H. Harmon. Factor Analysis: A Synthesis of Factorial Methods (Chicago: 





University of Chicago Press, 1948), p. 3. 


18. Karl J. Holzinger. "A Simple Method of Factor Analysis," Psychometrika, IX (December 19l)), ppe 


257-261. 


19. L. Le Thurstone. "A Multiple Group Method of Factoring the Correlation Matrix," Psychometrika, X 


(June 19h5), pp. 73-78. 


20. Chester Harris and John Schmid. "Further Application of the Principles of Direct Rotation in 
Factor Analysis," Journal of Experimental Education, XVIII (March 1950), pp. 175-193. 





21. Karl J. Holzinger. "Factoring Test Scores and Implications for the Method of Averages," Psycho- 


metrika, IX (September 19), pps 155-167. 


22. Chester W. Harris. "Direct Rotation to Primary Structure," Journal of Educational Psychology, XxXxXIx 


(December 1948), ppe 9-68. 





23. Chester W. Harris. "Projections of Three Types of Factor Pattern," Journal of Experimental Educa- 


tion, XVII (March 1949), pp. 335-35. 
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TABLE I 


INT ;ERCORRELATIONS, COMMUNALITY ESTIMATES AND RESIDUAL MATRIX FOR 
NINE INSERVICE ESTIMATES OF TEACHING SUCCESS 


R MATRIX 








Ry 
is 


(64)** 


54 


(6; 2) 
63 
(Se) 
43 
(6%) 
4c 
(613) 
1 ¥ 
(62. 
1& 
(5 


57 
(59) 


61 
(56) 


41 
(60) 


49 
(57) 


03 
(59) 


19 
(55) 


68 
(58) 


29 
(61) 


24 
(58) 


05 
(60) 


09 
(54) 


29 
(57) 


25 
(54) 


11 
(56) 


06 
(51) 


54 
(59) 


38 
(61) 


28 
(55) 


37 
(58) 


50 33 
(55) (54) 


08. 15 -07 01 24 17 22 41 
(533 (53) (51) (49) (83) (49) (51) ~— (48) 





* Deciieal points omitted. 
Resicuals have been placed in upper triangle, underlined. 

**Number inside parenthesis indicates the size of the sub-group used to calculate the 
corr+jation. 
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The correlations of each variable were sum- 
med over the three groups to produce G. In this 
case G is a nine by three matrix. The above op- 
eration is equivalent to post-multiplying R bya 
matrix E which has zero and unity elements ar- 
ranged as follows: 


TABLE IV 
E MATRIX 





— 
_— 
_ 





COOrFFrFCOCOOCOO 
ia) = 


Conor WD 
ooocococlO0O le ee 





A three by three matrix, H, was obtained by 
summing the columns of G over each group. This 
matrix represents the correlations of each group 
with every other group. In matrix notation the 
above operations can be summarized as follows: 


RE =G 
E'RE =H 


Next a diagonal matrix W was constructed. 
This matrix has as elements the reciprocal square 
roots of the diagonal elements of H. 

The G matrix, the H matrix, and the W matrix 
are shown in Tables V, VI, and VII, respectively. 
Pre- and post-multiplication of H by W yields 0, 
the matrix of intercorrelations of the group axes. 
This matrix is shown in Table VIII. 

The structure matrix S, is shown in Table IX. 
This matrix was obtained by post-multiplying G 


by W, and represents the correlations of each var- 


iable with the oblique group factors. Post-multi- 
plication of S by @~1 yields the matrix P. P is the 
matrix of coordinates of the variables with re - 
spect to these factors. The matrix P is shown 
in Table X. The values in P will be called factor 
loadings in the remainder of this study. 

The matrix multiplication SP = R+ yields the 
nine by nine matrix of reproduced correlations. 
One should note that the matrix multiplication 
GH~1G' will also yield this matrix of reproduced 
correlations. The matrix subtraction R - R+ was 
performed to yield the residual correlations 
shown in the upper triangle of Table II. An ex- 
amination of these residuals reveals that the cal- 
culated oblique solution reproduced the original 
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correlations very well. 

It was not considered necessary to rotate this 
oblique solution to another oblique solution since 
the pattern matrix, P, indicated that each of the 
variables are of complexity one. 

Inferences concerning the nature of the fac- 
tors were made in terms of the number of sig- 
nificant loadings in each column of the pattern 
matrix and the complexities of the variables. A 
variable is said to be of complexity one if itinas 
only one significant loading in the pattern matrix. 
In this study a minimal loading of .30 was re- 
quired for a factor to represent a contribution 
to the variance of a test. Thurstone’s 24 criter- 
ia for uniqueness of simple structure were used 
as a basis of interpretation of the pattern values 
obtained in this study. The pattern values ob- 
tained, without resorting to further rotation and 
change of reference axes, exhibit the properties 
of simple structure outlined by Thurstone. This 
led the investigator to conclude that this solution 
does not differ greatly from a primary axis so- 
lution since it is the primary pattern in such a 
solution which will exhibit Thurstone’s proper- 
ties of simple structure. 

The tentative names and the factor loadings 
on each of these oblique factors were as follows: 


I. Beginning Teacher Rating Scale Factor 
a 


PA, Principal’s Acceptability 
Rating . 783 
2. PM1, Principal’s M-Blank rating, 
lst year . 811 
3. S1M, Supervisor I M-blank rating . 782 
4. S2M, Supervisor II M-blankrating . 858 


These four variables have significant factor 
loadings on this factor. Each of the variables 
is of complexity one. The highest loadings are 
for variables 2 and 4. Variables i and3 have 
practically identical loadings on this factor. 


II. Second-year Rating Scale Factor 
5. PM2, Principal’s M-blank rating, 
2nd year . 858 
6. SpR, Outside Agency Rating . 641 
7. SE, Teacher’s Self Evaluation . 536 


Variables 5, 6, and 7 have significant load- 
ings on this factor. Variable 5 has the highest 
loading. Each of these variables is of complex- 
ity one. 


Ill. Peer-Pupil Response Factor 
8. PE, Peer evaluation . 561 
9. PuE, Pupil Evaluation . 769 


Variables 8 and 9, both of complexity one, 
have significant loadings on this factor. The 





2h. Le Le Thurstone. Multiple-Factor Analysis (Chicago: University of Chicago Press, 197), pe 3356 
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TABLE V 


CORRELATIONS OF EACH VARIABLE WITH EACH GROUP 
G MATRIX 





=] 
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TABLE VI 


H MATRIX 











TABLE VII 


W MATRIX 
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TABLE VIII 


@ MATRIX 





II 








TABLE Ix 


CORRELATIONS OF EACH VARIABLE WITH EACH 
OBLIQUE FACTOR 
S MATRIX (STRUCTURE MATRIX) 





I 


848* 
820 
749 
817 
439 
436 
111 
161 
053 





1. 
2. 
3. 
4. 
5. 
6. 
4 
8. 
9. 





*Decimal points omitted 


TABLE X 


COORDINATES OF EACH VARIABLE WITH EACH FACTOR 
P MATRIX (PATTERN MATRIX) 





I I 





783* 
811 
782 
858 
050 
113 
-165 
-025 
025 


2 9 IVD ON 





*Decimal points omitted 
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highest loading (. 769) is for variable 9. 


Summary of the Factor Analysis of Teaching 
Success Estimates 








It was found that a general factor would not 
account for the matrix of intercorrelations of the 
nine estimates of teaching success. 

Three common factors identified as 


I. Beginning Teacher Rating Scale 
Factor, 

Il. Second-Year Rating Scale Factor, 

Ill. Peer-Pupil Response Factor 


were found in the analysis described above. The 
factors are oblique with sizeable correlations be- 
tween Beginning Teacher Rating Scale Factor and 
Second-Year Rating Scale Factor (. 49); Second- 
Year Rating Scale Factor and Peer-Pupil Re- 
sponse Factor (.64). The correlation between 
Beginning Teacher Rating Scale Factor and Peer- 
Pupil Response Factor was low (.16). All of the 
correlations are positive. 

From the above analysis one must conclude 
that an arithmetic average of these nine estimates 
of teaching success would not yield a satisfactory 
composite estimate. 

In the fifth part of this section, estimates of 
these factors in the form of factor scores will be 
obtained and used as ‘‘composite’’ criteria of 
teaching success. 


Part Two: The Factor Analysis of the Rx Matrix 





The correlations of each of the seven temper- 
ament traits with each of the nine estimates of 
teaching success were calculated. These correl- 
ations are the entries in the seven by nine matrix 
labeled the Rx Matrix and recorded in Table XI. 
Each of these temperament variables had been 
found, in previous research studies, to be ofcom- 
plexity one. An examination of the Rx Matrix re- 
veals that 18 of the 63 correlations are negative. 
The 63 correlations ranged from -.26 (N = 58, 
between Thurstone Active, 10, and Supervisor I 
M-blank rating, 4) to .31 (N = 58, between Soci- 
able, 15, and Supervisor II M-blank rating, 4). 
For a random sample of 64, the standard error 
of a zero correlation is .125. Thus, a coeffic- 
ient of .25 or greater is necessary in order for 
the correlation to be significantly different from 
zero at the five percent level. For 58 cases a 
coefficient of .26 or larger is necessary for sta- 
tistical significance at the same level. The 64 
subjects used in this study do not represent a 
random sample. However, it was decided to use 
the above levels of significance in analyzing the 
data. Only the two correlations reported above 
and the correlation of .26 (N = 64, between Dom- 
inant, 14, and Acceptability, 1) are significantly 
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different from zero by the above mentioned 
standards. In addition to these three correla- 
tions, seven other correlations are of magni- 
tude .20 or larger. Temperament variables 
numbered 11, 13, 14, and 15 had correlations 
of .20 or larger with teaching success estimate 
6. In general these correlations tend to be quite 
small. This leads one to conclude that there is 
very little linear relationship between these 
seven temperament traits (as assessed by the 
Thurstone Temperament Schedule) and the nine 
estimates of teaching success. 

Factor analysis techniques were employed 
in analyzing the correlations in the Rx matrix 
(and also the Ry and Rz matrices) to determine 
if the three oblique common factors described 
in part one underlie temperament, personality, 
and pre-service measures of achievement and 
intelligence. If these factors do underlie such 
measures then an analysis similar to that em- 
ployed in the analysis of the original nine by 
nine matrix should account for the correlations 
in these three rectangular matrixes. If theres- 
idual correlations are small, then an examina- 
tion of the appropriate pattern matrixes will re- 
veal the temperament, personality, and pre- 
service achievement variables which have sig- 
nificant loadings on the three oblique teaching 
success factors. Such a procedure seemed to 
be a very straight forward and sensible one 
since in this study we are concerned only with 
the relationships between these temperament, 
personality and achievemert variables and esti- 
mates of teaching success. Thus, for example, 
we were not primarily concerned with the inter- 
correlations of the temperament variables nor 
with the correlations of the temperament vari- 
ables with the personality variables. 

The Rx Matrix was postmultiplied by the E 
matrix (Table Ill) to yield the Gy matrix (Table 
XII). The matrix multiplication G,H-1G' was 
performed resulting in a reproduced correlation 
matrix of order seven by nine labeled Rx. This 
latter matrix was subtracted from the Rx matrix 
to yield a matrix of residual correlations (Rx - 
Rx). This matrix may be found in Table XII. 
One notes that the residuals with few exceptions 
are very small. It thus appears that the three 
teaching success factors do underlie these sev- 
en temperament traits. The correlations ofthe 
temperament variables with these oblique fac- 
tors may be found in Table XIV. They were ob- 
tained by the matrix multiplication GxW = Sx 
(Structure Matrix). The P, matrix (Pattern 
Matrix) showing the coordinates of the seven 
temperament variables with respect to the oblique 
factors is recorded in Table XV. This matrix 
was obtained by post-multiplying S, by Q-1. 

It is interesting to note that the correlations 
of the seven temperament variables with the 
three oblique factors are small. Variables 11 





September, 1954) ERICKSON 


TABLE XI 


CORRELATIONS OF SEVEN TEMPERAMENT TRAITS WITH NINE 
ESTIMATES OF TEACHING SUCCESS 


Ry MATRIX 





1 2 3 4 5 6 7 8 9 
(64)* (61) (62) (58) (63) (60) (62) (56) (53) 


-17%* -04 -14 -26 -14 15 -14 11 21 
12 19 -08 -03 05 24 -09 16 05 
18 01 -02 07 15 16 -07 06 ~12 
26 16 06 17 09 22 12 08 -12 
16 16 08 09 06 20 09 04 00 
23 18 09 31 12 20 12 -01 -19 
08 -01 20 01 -01 03 16 00 -10 








* Number inside parenthesis indicates the size of the subgroup used to calculate the correlations 
in that column. 
**Decimal points omitted. 


TABLE XI 


CORRELATIONS OF EACH TEMPERAMENT VARIABLE WITH 
EACH GROUP 
Gy MATRIX 





I 





-61* 
20 
24 
65 
49 
81 
28 





*Decimal points omitted 


TABLE XII 


RESIDUAL CORRELATIONS 


Rx-Rx MATRIX 





4 5 


-09 -04 
06 -07 -01 
11 02 04 
07 01 -11 
02 -09 
00 10 -10 
00 -10 








*Decimal points omitted 
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TABLE XIV 


CORRELATIONS OF EACH TEMPERAMENT VARIABLE WITH 
EACH OBLIQUE FACTOR 


Sx MATRIX (STRUCTURE MATRIX) 





I 


-189* 
062 
074 
201 
152 
251 
087 








*Decimal points omitted 


TABLE XV 


CORRELATIONS OF EACH TEMPERAMENT VARIABLE WITH 
RESPECT TO EACH OBLIQUE FACTOR 


PX MATRIX (PATTERN MATRIX) 





I II 


-117* 
054 

-020 
076 
066 
097 
013 








*Decimal points omitted 
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and 14 are positively correlated with each oblique 
factor axis. 

None of the seven variables have significant 
factor loadings on Factor I. Variables 13 and 
15 have significant loadings on Factor II while 
variables 10 and 15 have significant loadings on 
Factor II. 

It should »< noted that RX can also be obtained 
by following either one of the following matrix 
multiplications: 


SxP'=Rx or PxS'=Ry,. 


The reproduced correlations of the Ry and 
the Rz matrixes (to be discussed in the next two 
parts) can also be computed by substituting in 
the above equations the structure and pattern 
values associated with a particular matrix. 


Part Three: The Factor Analysis of the Ry 
Matrix 





The correlations of each of the sixteen per- 
sonality variables with each of the nine estimates 
of teaching success were calculated. These cor- 
relations are the entries in the sixteen by nine 
rectangular matrix labeled the Ry Matrix andre- 
corded in Table XVI. 

An examination of the Ry Matrix reveals that 
over half of the correlations are negative (75 of 
the 144 correlations). The sign of each of these 
correlations was changed so that they would 
agree with other correlations, as regards sign 
and meaning. (A low number for the inservice 
estimates indicates a high rating whereas a high 
number for the personality factor scores indi- 
cates the plus pole of the given trait.) The 144 
correlations ranged from -. 32 (N = 53, between 
Q,, 29, and Pupil Evaluation, 9) to .38 (N = 38, 
between Q,, 31, and Self-Evaluation, 7). In ad- 
dition to these two correlations, ten correlations 
were significantly different from zero by the 
standards discussed in Section 2. Including these 
twelve correlations, there are a total of 34 cor- 
relations of .20 or larger in the matrix. All of 
the correlations of variable 26 (M) with the es- 
timates of teaching success are negative. Seven 
of the nine correlations of variable 28 (0) with 
these estimates are negative. All nine correla- 
tions of variable 31 (Q,) with these estimates 
are positive with seven of them being .20 or 
larger. Six of the nine correlations of variable 
22 (G) with the nine teaching success estimates 
are .20 or larger. All nine correlations are 
positive. The 45 correlations of variables 17 
(A), 19 (C), 21 (F), 25 (L), and 30 (Q,) with the 
teaching success estimates are all less than .20. 
All of the correlations tend to be quite small. 
Therefore, there seems to be little relationship 
between these particular estimates of teaching 
success and the sixteen source traits of person- 
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ality (as assessed by the 16 P. F. Test). 

The Ry Matrix was post-multiplied by the E 
Matrix to yield the Gy Matrix (Table XVII). The 
matrix multiplication ’ GyH-1G' yielded a repro- 
duced correlation matrix of order sixteen by 
3 labeled R¥. The matrix subtraction Ry - 


a, sy the matrix of residual correlations 


( (See Table XVIII.) These residuals 
tend to i quite small. It thus appears that the 
factoring method employed accounts for these 
correlations. Therefore one might conclude 
that the three oblique factors also underlie the 
sixteen personality source trait variables. The 
correlations of the personality variables with 
these oblique factors may be found in Table XIX. 
This sixteen by three structure matrix was ob- 
tained by the matrix multiplication GyW = Sy. 
The pattern matrix Py was obtained by the ma- 
trix multiplication Sy)-1. This matrix will be 
found in Table XX and lists the coordinates of 
the sixteen personality variables with respect 

to the three oblique factors. These coordinates 
as pointed out earlier in this section are also 
called the factor loadings. 

The structure matrix Sy reveals that the cor- 
relations of the sixteen personality variables 
with the three oblique factors are low. Of these 
sixteen variables only variables 22 and 31 are 
positively correlated with each oblique axis. 

An examination of Table XX indicates that 
variables 18 and 29 have significant loadings on 
Factor I. On Factor I will be found significant 
loadings for ten of the sixteen personality meas- 
ures. The variables having significant loadings 
are 17, 18, 20, 23, 24, 27, 28, 29, 31, and 32. 
Variables 20, 22, 23, 24, 26, and 29 have sig- 
nificant loadings on Factor III. 


Part Four: The Factor Analysis of the Rz 
Matrix 





The correlations of each of the ten achieve- 
ment variables with each of the nine estimates 
of teaching success were calculated and record- 
ed in the ten by nine Rz Matrix (Table XXI). 

Thirty of these ninety correlations are neg- 
ative. It is interesting to note that all nine cor- 
relations of variable 33 (PT75) with the nine es- 
timates are positive with four of them being 
significant at the 5% level of significance. Like- 
wise all of the correlations of variable 34 (Prac- 
tice Teaching Grade in Educational Methods 
Course) with these nine estimates of success, 
although small, are positive. These correla- 
tions range from ~-.29 (N = 52, between vari- 
able 35, Grade in Education 73, and pupil eval- 
uation, variable 9) to .28 (N = 62, between var- 
iable 33, PT75, and variable 2, PM1). Onlyten 
of the correlations are .20 or larger. Eight of 
the nine correlations of variable 36 with these 
estimates of teaching success are negative. 
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TABLE XVII 


CORRELATIONS OF EACH PERSONALITY VARIABLE WITH 
EACH GROUP 


Gy MATRIX 





I 


02* 
-73 
-06 
-08 

19 

65 

23 
-08 
-32 
-71 
-31 
-41 
-59 
-09 

77 
-65 








*Decimal points omitted 


TABLE XVIII 


RESIDUAL CORRELATIONS 
Ry-Ry MATRIX 





1 





-16* 
03 
03 

-05 

-05 

-08 
02 
11 
02 
10 

-17 

-02 

-04 
02 
04 
00 





*Decimal points omitted 
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TABLE xXIx 


CORRELATIONS OF EACH PERSONALITY VARIABLE WITH 
EACH OBLIQUE FACTOR 


Sy MATRIX (STRUCTURE MATRIX) 








I II Il 
17 006* 143 -055 
18 -226 197 218 
19 -019 -034 -023 
20 -025 093 -113 
21 059 -015 -158 
22 201 246 361 
23 071 266 -053 
24 -025 -054 -301 
25 -099 -015 000 
26 -220 -192 -383 
27 -096 261 256 
28 -127 -325 -068 
29 -182 044 -323 
30 -028 -064 -075 
31 238 428 150 
32 -201 -236 015 





*Decimal points omitted 


TABLE XX 


COORDINATES OF EACH PERSONALITY VARIABLE WITH 
RESPECT TO EACH OBLIQUE FACTOR 


Py MATRIX (PATTERN MATRIX) 








I I Il 
17 -140* 393 -284 
18 ~4il 365 049 
19 -004 -030 -003 
20 -159 387 -336 
21 038 123 -243 
22 185 -098 394 
23 -161 617 ~424 
24 -094 300 -479 
25 -123 055 -015 
26 -268 268 -512 
27 -276 344 079 
28 093 -541 265 
29 -407 695 -705 
30 -008 -022 -060 
31 -003 568 -215 
32 -059 -381 269 





*Decimal points omitted 
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CORRELATIONS OF TEN PRE-SERVICE MEASURES OF ACHIEVEMENT WITH NINE 
ESTIMATES OF TEACHING SUCCESS 


ERICKSON 


TABLE XXI 








Rz MATRIX 

1 2 3 4 5 6 7 i 9 

33 08* 28 25 17 11 24 28 28 12 
(64)** (61) (62) (58) (63) (60) (62) (56) (53) 

34 05 06 13 i9 03 08 07 19 10 
(64) (61) (62) (58) (63) (60) (62) (56) (53) 

35 09 -16 06 00 00 -09 09 -07 -29 
(63) (60) (61).+ (57) (62) (59) (61) (55) (52) 

36 -02 -03 -03 -10 07 -07 -04 -10 -20 
(63) (60) (61) (57) (62) (59) (61) (55) (53) 

37 19 05 00 03 12 -04 12 06 -14 
(64) (61) (62) (58) (63) (60) (62) (56) (53) 

38 -07 -04 06 14 -17 -01 -06 -10 -17 
(60) (57) (58) (54) (59) (58) (59) (53) (49) 

39 08 -18 -10 06 -23 10 04 -09 -07 
(56) (53) (55) (53) (55) (52) (55) (48) (46) 

40 06 09 -08 -08 00 18 10 10 -19 
(58) (56) (56) (53) (57) (54) (56) (51) (49) 

41 10 11 07 -02 10 24 06 13 22 
(64) (61) (62) (58) (63) (60) (62) (56) (53) 

42 13 03 07 11 18 09 06 15 04 
(64) (61) (62) (58) (63) (60) (62) (56) (53) 





* Decimal points omitted. 
**Number inside parenthesis indicates the size of the subgroup used to calculate the correlation. 
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The Educational Methods Composite Lecture 
Section Grade, Variable 42, correlates posi- 
tively but low with these nine variables. 

The Rz Matrix was post-multiplied by the E 
Matrix to yield the Gz Matrix (Table XXII). The 
ten by nine reproduced correlation matrix Rz 
was obtained by the matrix multiplication 
GzH~1G'. This reproduced matrix was subtract- 
ed from Rz to produce the Rz - Rf matrix of 
residual correlations. This matrix may be found 
in Table XXIII. Since these residuals tend to be 
Small one must conclude that the initial oblique 
solution applied to his matrix satisfactorily ac- 
counts for the correlations. The ten by nine 
structure matrix Sz will be found in Table XXIV. 
Sz was obtained from the matrix multiplication 
GzW =Sz. This matrix shows the correlations 
of the ten preservice measures of achievement 
with the three oblique factors. The pattern ma~ 
trix Pz shown in Table XV was obtained by the 
matrix multiplication Sz@-1 = Pz. These values 
give the coordinates of these ten variables with 
respect to the three oblique factor axes. 

An examination of the Sz matrix reveals that 
the correlations with the oblique axes are low. 
Variables 33, 34, 41, and 42 are positively cor- 
related with each axis. 

None of these ten variables have significant 
loadings on Factor I (refer to the Pz matrix). 
Variables 35, 36, ana 40 have small but signif- 
icant loadings on Factor II and on Factor III sig- 
nificant loadings for variables 34, 35, 36, and 
40 will be found. 

The results of the factor analyses reported 
in the first four parts of this section could be 
summarized as follows: 

1. Three oblique factors accounted for the 
matrix of intercorrelations of the nine in-service 
estimates of teaching success, and 

2. These three factors appeared to underlie 
the temperament, personality, and achievement 
measures used in this study. 


Part Five: The Factor Estimates 





In the first four parts of this section we have 
been concerned with the problem of the linear 
resolution of the variables in terms of three 
hypothetical factors. In this part regression 
techniques will be employed to describe these 
three oblique factors in terms of the nine inser- 
vice estimates of teaching success. The weight- 
ing of each variable in such an estimation equa- 
tion is determined partly by the correlation 
(structure value) of that variable with the factor. 

The equations for estimating the three factors 
were determined by use of the shortened method 
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presented by Holzinger and Harman. 25 
In standard form the equations are: 


(1) Fy . 31221 + .272z9 + . 19223 + .281z4 + 

. 04225 + .042z6 - . 01827 - . 008zg - 
(2) Fy =. 12724 + .049z9 - .015z3 - . 02524 + 
. 40725 + .303zg + . 15927 + . 168zg + 
. 04729 


. 0002; + .119z2 - .081zg - . 074z4 + 
. 05225 + .173zg + . 10827 + . 3752zg + 
. 396Z9 


(3) Fry 


An estimated factor is not in standard form 
although such a variable has a mean of zeroand 
a standard deviation equal to the coefficient of 
multiple correlation. 

The coefficients of multiple correlation for 
these factors were as follows: 


S.D.I =Ry =.946 


S.D. Il =Rpy = .896 
S.D. Il = Ry = . 831 


The estimation equations when expressed in 
terms of the observed values are: 


(4) F; . 390xy + .313x9 + .229xg + .296x4 + 
. O67x5 + . 023xg - .027x7 - . 006xg - 
. 003xg - 3.01 

(5) Fy; =. 159x, + . 056xq - .018xg - . 026x4 + 

. 646x5 + . 163xg + .241x7 + . 116xg + 

. 016Xg - 4.14 


. 000x; + .137xg - . 096xg - . O78x4 + 
: 083x5 + .093xg +. 164x7 + .259xg + 
: 131Xg - 3.48 


(6) Fry 


Each of the above equations was evaluated 
for each teacher. The factor scores are shown 
in Table XXVI. 

In order to eliminate negative values a trans- 
formation to an arbitrary positive scale was 
made. 

The factor scores were standardized and this 
variable was equated to an arbitrary variable X 
with mean of 5 and standard deviation of 1. This 
transformation can be written: 


(7) Xsi = Fj +5 


RF, 





256 Ope cit., ppe 278-83. 
26. Ibid., Pe 273. 
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TABLE XXII 


CORRELATIONS OF EACH ACHIEVEMENT VARIABLE 
WITH EACH GROUP 


Gz MATRIX 





I 


33 78* 
34 43 
35 -01 
36 -18 
37 27 
38 09 
39 -14 
40 -01 
41 26 
42 34 








*Decimal points omitted 


TABLE XXII 


RESIDUAL CORRELATIONS 
Rz-Rz MATRIX 





1 2 


33 -15* 05 
34 -06 -08 
35 09 -12 
36 02 05 
37 11 -01 
38 -08 -04 
39 12 -13 
40 04 11 
41 01 01 
42 03 -07 








*Decimal points omitted 
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TABLE XXIV 


CORRELATIONS OF EACH ACHIEVEMENT VARIABLE WITH 
EACH OBLIQUE FACTOR 


Sz MATRIX (STRUCTURE MATRIX) 








I II 





241* 

133 089 
-003 000 
-056 -020 

084 

028 
-043 
-003 

080 

105 





*Decimal points omitted 


TABLE XXV 


COORDINATES OF EACH ACHIEVEMENT VARIABLE WITH 
RESPECT TO EACH OBLIQUE FACTOR 


Pz MATRIX (PATTERN MATRIX) 





I II 





161* 

181 -206 
-101 
-140 306 
006 

072 
-063 098 
-153 410 
027 030 
050 





*Decimal points omitted 





I 


F 


Teacher 
Number 


Z 
fe) 
a 
2 
rs 
fa 


TABLE XXVI 


Fin 


TERMS OF OBSERVED VALUES 


I 


ESTIMATES OF THE THREE OBLIQUE FACTORS EXPRESSED IN 
F, 





Teacher 
Number 
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The transformed factor scores are shown in 
Table XXVII. A low number indicates a high 
‘*composite’’ rating. 

We will consider that a score of 6 or larger 
(1S. D. or more above the mean) indicates that 
a teacher tends to be considered a poor teacher; 
and that a score of 4 or less (1S. D. or more be- 
low the mean) indicates that a teacher tends to be 
considered a good teacher. 

Using these standards we find that teachers 
numbered 1, 16, and 32 would be considered be- 
low average since for each of these teachers the 
three factor scores were greater than six. Tea- 
chers 25 and 53 had factor scores of 4 or less 
on each of these factors. They would be consid- 
ered as above average as teachers. We find 
four additional teachers whose factor scores 
were six or larger for factor estimate one, eight 
in addition to the three listed above whose factor 
scores were six or larger for factor two, and 
eight whose factor scores related to factor three 
are six or greater. 

Six teachers, in addition to the two mentioned, 
can be considered above average as regards fac- 
tor one; eight teachers have factor scores of 4 or 
less on factor two; and seven teachers have fac- 
tor scores of 4 or less related to factor three. 

Thus, if the factor scores (estimates of the 
three factors) are considered as ‘‘composite’”’ 
estimates of teaching success we find that there 
are eight ‘‘good’’ and seven ‘‘poor’’ teachers as 
determined by ‘‘composite’’ one. 

Ten ‘‘good’”’ and eleven ‘‘poor’’ teachers are 
identified by ‘‘composite’’ two. 

In a similar fashion nine ‘‘good”’ and eleven 
‘*poor’’ teachers are identified by ‘‘composite’’ 
three. 


Part Six: The Prediction Study 





In this section the results of the multiple- 
correlational techniques will be discussed. 

As indicated in the preceding part the factor 
scores were considered as ‘‘composite’’ criter- 
ia of teaching success. 

The correlations of each variable (10 through 
42) with each of the three composite criteria are 
presented in Table XXVIII. 

One observes that the correlations of these 
thirty three variables with estimated factor one 
in this table are almost identical to the structure 
values found in the first column of the Sx, Sy, 
and Sz matrices. The discrepancies between the 
correlations of Table XXVIII and the structure 
values in the Sx, Sy, and Sz matrices are larger 
for factors I and 1H. These discrepancies no 
doubt arise due to the linear transformation 
(Equation 7, part 5) employed to obtain esti- 
mates of these factors. The multiple coefficient 
of correlation for factor I was higher than the R 
for either factor II or II, and therefore the two 
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sets of correlations related to factor I should 
show the greatest agreement. A correlation of 
.25 or larger was required to indicate statisti- 
cal significance. Using such a standard, itwas 
found that none of the variables correlated sig- 
nificantly with ‘‘composite’’ criterion one. Var- 
iables 28, 31, and 33 were significantly correl- 
ated with criterion two; while variables 26, 27, 
and 33 were significantly correlated with cri- 
terion three. 

A multiple-correlation coefficient for vari- 
ables 28, 31, and 33 with ‘‘composite’’ criter- 
ion two was computed. R was found to be . 43. 
The intercorrelations were as follows: Vari- 
able 28 correlated -. 48 with variable 31, and 
.07 with variable 33. Variable 31 correlated 
. 09 with variable 33. 

A multiple-correlation coefficient for vari- 
ables 26, 27, and 33 with ‘‘composite’’ criter- 
ion three was also computed. R was found to 
be .42. The intercorrelations in this case were 
found to be r9g 27 = -.26; 126,33 = -. 04; and 
r27.33 = .13. 

n view of the very low multiple R’s report- 
ed above it appears that the prediction of the 
factor scores obtained in this study using tem- 
perament, personality, and pre-service meas- 
ures of achievement is not promising. The two 
multiple R’s reported account for only 18% of 
the total variance of ‘‘composite”’ criteria two 
and three. 

Since none of the measures were significant- 
ly related to ‘‘composite’’ criterion one, no at- 
tempt was made to predict this derived esti- 
mate of teaching success. 

One must conclude that accurate prediction 
of the three obtained ‘‘composite’’ criteria us- 
ing the measures and the techniques employed 
in this study is not possible, or sensible. 

The results of each of the six parts of this 
section have been summarized at the end of 
each part. The final summary and conclusions 
will be found in the next section. 


SECTION IV 
SUMMARY AND CONCLUSIONS 


Statement of the Problem 





THE PURPOSE of this investigation was 
to develop equations for the prediction of teach- 
ing success with a minimum number of vari- 
ables and maximum stability as suggested by 
factor analyses of a variety of measures of the 
temperament, personality, and achievement of 
teachers. In order to develop a basis for simp- 
lified patterns of prediction a factorial study 
was made of the interrelationships among the 
several variables. These factors should point 
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TABLE XXVII 
TRANSFORMED FACTOR SCORES 





Teacher Teacher F 
Number Number 


33 
34 
35 





.24 
. 64 
99 
87 
82 
35 
20 
68 
64 
24 
19 
04 
76 
19* 


Conovhrwn re 


91 
06 
40 
15 
40 


34 
63 
97 


20 
58 
4.36 

6. 09* 
.83** 3. 97** 
10 4. 46 

. 36* 6. 53* 


.12 
. 56 
27 
68 


5. 
4. 
3. 
4. 
6. 
4. 
6. 
4. 
3. 
5. 
5. 
5. 
4. 
4. 
8. 
5. 
4. 
4. 
5. 
5. 
5. 
5. 
4. 
3. 
5. 
4. 
4. 
6. 
3 

4. 
6 





* 1S8.D. or more above the mean. 
**1 S.D. or more below the mean. 
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TABLE XXVIII 


CORRELATIONS OF VARIABLES TEN THROUGH FORTY-TWO WITH THE 
‘COM POSITES’’ CRITERIA (FACTOR ESTIMATES) 





Variable Fin Variable Fy 


Number 





18 16 
17 

01 

05 

08 
-02 
-04 
-01 

19 


00 


23 
09 
-21 
-01 





* Decimal points omitted 
**Significant at the 5% level 
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the way to better estimates of teaching success. 


Measures Used in the Analysis 





Forty-two variables were used in the analysis. 
These may be classified into four groups as fol- 
lows: 


Group (a). Nine estimates of teaching success, 
consisting of: 
1. The principal’s acceptability rating of the 
teacher 
2. The principal’s rating of the teacher (1st 
year) on the Wisconsin Adaptation of the 
M-blank 
3,4. A supervisor’s rating of the teacher on 
the Wisconsin Adaptation of the M-blank 
5. The principal’s M-blank rating of the tea- 
cher (2nd year) 
6. Teacher ratings obtained from an outside 
agency 
7. Teacher’s self-evaluations 
8. Peer evaluations 
9. Pupil evaluations 


Group (b). Measures of temperament. This 
group was composed of seven variables derived 
from the Thurstone Temperament Schedule. They 
were as follows: 

10. Active 

11. Vigorous 

12. Impulsive 

13. Dominant 

14. Stable (E for emotionally stable) 

15. Sociable 

16. Reflective 


Group (c). Measures of personality. This group 
was composed of the sixteen variables obtained 
from the Sixteen Personality Factor Test. They 
were as follows: 
17. Cyclothymia vs. Schizothymia 
18. General intelligence vs. mental defect 
19. Emotional stability or ego strength vs. 
general neuroticism 
20. Dominance or ascendance vs. submission 
21. Surgency vs. desurgency 
22. Positive character vs. immature depend- 
ent character 
23. Adventurous cyclothymia’vs. inherent 
withdrawn schizothymia 
24. Emotional sensitivity vs. tough maturity 
25. Paranoid schizothymia vs. trustful acces- 
sibility 
26. Bohemianism vs. practical concerned- 
ness 
27. Sophistication vs. rough simplicity 
28. Worrying suspiciousness vs. calm trust- 
fulness 
29. Radicalism vs. conservatism 
30. Independent self-sufficiency vs. lack of 
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resolution 
31. Will control and character stability 
32. Nervous tension 


Group (d). Measures of pre-service achieve- 
ment. These measures were obtained from 
University files and were as follows: 
33, 34. Practice teaching grades in Educa- 
tion 75 and education methods course 
35, 36, 37. Grades in Education 73, 74 and 
lecture section in Education 75, respec- 
tively 
38. Percentile rank in high school class 
39. Percentile rank on American Council 
on Education Psychological Examination. 
40. Freshman-sophomore grade point aver- 
age 
41. Junior-senior grade point average 
42. Composite grade for two education 
methods courses 


Design of the Study 


Sixty-four teachers engaged in their second 
year of teaching in Wisconsin high schools were 
chosen for this study. This group of teachers 
had graduated from the School of Education of 
the University of Wisconsin in 1950. They were 
a subgroup of the total group of 101 (1950) grad- 
uates visited during the beginning *eacher vii3i- 
tation program in the fall of 1950. The first 
four estimates of teaching success were ob- 
tained during this visitation while the remain- 
ing five estimates were secured during the 
1951-1952 school year. 

All ratings and grades were converted to 
numerical scores for the analysis. The Pear- 
son product-moment coefficient of correlation 
was used throughout the study to indicate linear 
relationships between various sets of scores. 

The matrix of intercorrelations of the teach- 
ing success estimates was factored using the 
multiple-group method of factoring. A further 
study was made to determine if such factors 
were factors underlying measures of tempera- 
ment, personality, and achievement. Estimates 
of these factors were used as ‘‘composite”’ cri- 
teria of teaching success in the prediction phase 
of this study. 


Results of the Statistical Analysis 





The questions originally asked in Section I 
will be answered here. A general factor does 
not account for the intercorrelations of the nine 
estimates of teaching success. The three oblique 
common factors which accounted for these in- 
tercorrelations may be designated as: 


1. Beginning Teacher Rating Scale Factor: 
Principal’s acceptability rating, Princi- 
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pal’s M-blank rating (1st year), Supervis- 
or I M-blank rating, and Supervisor I M- 
blank rating 


Second-year Rating Scale Factor: Princi- 
pal’s M-blank rating (2nd year), Outside 
agency rating, and Teacher’s self-evalu- 
ation 


Peer-Pupil Response Factor: Peer eval- 
uation, and pupil evaluation 


These factors were found to underlie the sev- 
eral temperament, personality, and achievement 
measures used in this study. The residual cor- 
relations in the proper rectangular matrices 
were small and therefore further factoring did 
not seem advisable. Additional factors might 
have been extracted, but these probably would 
have accounted for very little of the remaining 
variance shared by these measures and the esti- 
mates of teaching success. 

Factor estimates of the three factors used as 
‘‘composite’’ criteria would identify eight ‘‘good’’ 
and seven ‘‘poor’’ teachers (factor I); ten ‘‘good’’ 
and eleven ‘‘poor’’ teachers (factor II); and nine 
‘‘good’’ and eleven ‘‘poor’’ teachers (factor III). 

Prediction of these ‘‘composites’’ using 
temperament, personality, and achievement 
measures was not successful since the correla~ 
tions of these variables with the ‘‘composites’’ 
were exceedingly low, ranging from -.28 to .35. 
Only nineteen of these ninety-nine correlations 
were .20 or larger. Correlations of these three 
types of personal data with the nine original es- 
timates of teaching success were also generally 
low ranging from -.32 to .38. Fifty-four of 
these 297 correlations were .20 or larger. 


Conclusions 


The low correlations of the several tempera- 
ment, personality, and achievement variables, 
as here measured, with the nine estimates of 
teaching success and the three ‘‘composites”’ 
seems to indicate that the relationship of these 
measures to teaching success as here measured 
has not been definitely established. 

Although low, thirty-three of the seventy- 
three more substantial correlations (. 20 or larg- 
er) reported in this study were correlations of 
the nine original and the three ‘‘composite’’ es- 
timates of teaching success with the G, M, and 
Q; scales of the 16 P. F. Test and with the prac- 
tice teaching grade in Education 75. Each of 
the twelve correlations involving M (Bohemian- 
ism vs. practical concernedness} is negative, 
while the thirty-six coefficients involving G 
(positive character vs. immature dependent 
character), Q, (will control and character sta- 
bility), and practice teaching are positive. Due 
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to the consistency of these correlations, as re- 
gards sign and magnitude, it would appear that 
these four measures are slightly related to 
teaching success. 

Since a general factor did not account for the 
intercorrelations of the estimates of teaching 
success, one must conclude that these several 
ratings are not measuring the same ‘‘factors. ’’ 
Therefore, the use of a criterion obtained by 
averaging these nine measures would not be just- 
ified. The three derived ‘‘composites’’ of tea- 
ching success could be used separately in a pro- 
gram of differential prediction. 
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A STUDY OF CERTAIN CRITERIA OF TEACH- 
ING EFFECTIVENESS 
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SECTION I 
THE PROBLEM 


IN 1915 BOYCE (i4) pointed out the need 
for objective measures of teaching effective- 
ness. He stated that this need is expressed in 
three areas, namely: (1) in the guidance of pro- 
spective teachers; (2) in the improvement of 
teachers in service; and (3) in promotion and 
dismissal. Boyce was concerned with the eval- 
uation of teacher efficiency. As theworkinthis 
area was extended, increased concern arose 
over the criteria of teaching effectiveness. Eight- 
een years later G. L. Betts (10:99) stated: 


One of the crucial problems in evalu- 
ating the education of teachers, if not 
indeed the crucial problem, is the dis- 
covery of a satisfactory criterion of 
teaching success. With such a criterion 
at hand, it would be possible immediately 
to make a scientific evaluation of the 
worth of various programs for the pre- 
service education of teachers through a 
study of the teaching efficiency of large 
groups of teachers prepared in accord- 
ance with those programs. It would be 
possible to determine what knowledges, 
skills, and attitudes are most commonly 
associated with good teaching, and thus 
to plan curricula definitely directed to- 
ward increasing these desirable assets. 
It would be possible to discover what ca- 
pacities and personaltraits are typical of 
good teachers, and thus to select for ad- 
mission to institutions for the education 
of teachers the candidates most likely to 
become successful teachers. Scores of 
other problems relative to such matters 
as tenure, salary, methods of supervis- 
ion, the value of practice teaching, age 
qualifications, requirements for certifi- 
cation, and the like, could be studied ex- 
perimentally. In the absence of a valid 
and reliable criterion for teaching suc- 
cess, solutions for these urgent prob- 
lems must be sought in the realm of ex- 
pert opinion. 





As late as 1949 Ryans (58:696) wrote: 


Certainly until we are able to establish 
adequate criteria of teaching competency, 
our whole system of teacher training, ap- 
pointment, promotion, and tenure funda- 
mentally is on shaky ground. 


A multitude of attempts to establish a satis- 
factory criterion have been made during recent 
years. The criteria that are in general use at 
present may bé divided in three categories: (1) 
merit ratings; (2) measures of pupil status and 
change; and (3) tests of abilities thought essen- 
tial to success. Rating scales may be further 
classified with respect to the person using the 
scale, namely the supervisor, adniinistrator, 
pupil, teaching colleague, or the teacher him- 
self. Pupil status and change is measured by 
some form of test which is designed to estimate 
the extent to which certain objectives of instruc- 
tion are attained. In some cases these tests 
cover only certain selected aspects of subject 
matter, but often they attempt to discover status 
and change in areas such as attitudes, interests, 
appreciations and information. Evaluations of 
this type are sometimes made for a single school, 
city, county, or even an entire state. 4‘eacner 
examinations usually measure professional and 
subject matter information, mental capacities, 
cultural status and professional interests and 


activities. 
This study is concerned with the degree to 


which the various criteria of type 1 and 2 agree 
in their rankings of teachers. 


Statement of the Problem 





The primary purpose of the study reported 
herein is to investigate the relationships among 
certain criteria of teacher effectiveness by the 
use of comparative and correlational techniques. 
A secondary aim is to trace the historical de- 
velopment of these criteria and to note justifi- 
cations offered for their use. The investiga- 
tion also serves as a followup of selected teach- 
ers holding the University of Wisconsin Teach- 
er’s Certificate. 

It should be born in mind that this is not an 





# The author wishes to express his appreciation to Dr, A. S. Barr, his major professor, for his en- 
couragement, direction, and inspiration throughout this investigation. 
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attempt to validate criteria of teaching compe- 
tence. Such validation implies thata known 
standard e»ists and that comparisons can be 
made to dis;cover the extent to which the partic- 
ular criter:on correlates with the standards. In 
this study 1) such standard is accepted or post- 
ulated. 

Specifically the investigation seeks to answer 
the following questions: 


1. What justifications are commonly offered 
for the use of various criteria? 

2. To what extent are these criteria correl- 
ated? 

3. To what extent do the different estimates 
made by the use of the same instrument on the 
same group of teachers by different raters agree? 


SECTION II 


AN OVERVIEW OF THE CRITERIA SELECTED 
FOR STUDY 


THE EDUCATIONAL literature is filled 
with materials relative to the criteria of teach- 
er effectiveness. Some of the material is based 
on research; some on philosophic principles and 
logic; and some primarily on opinion, often 
founded on casual observation and intuition. The 
reader needs to exercise considerable judgment 
and discretion in picking his way through the rel- 
evant and irrelevant. 

The following discussion represents the ma- 
terial that the writer felt significant in consid- 
ering the strengths and weaknesses of the select- 
ed criteria. The organizational plan takes up 
first rating system in general, then proceeds to 
specific rating categories designated by the clas- 
sification of the rater, continues witha brief 
consideration of teacher evaluation by pupil 
achievement, and concludes with a brief sum- 
mary. 

If the criteria of teaching effectiveness are 
to be considered in their proper perspective, 
cognizance must be taken of the foundations up- 
on which their validity rest. Every criterion is 
an arbitrary standard which serves to evaluate. 
Most of the criteria in common use have been 
validated by some accepted system of values. 
Cook (in 43:1) states the case very wellas follows: 
‘*The value of measurement depends on the ex- 
tent to which the relationships established are 
crucial from the social point of view. ’’ The ul- 
timate validation of criteria of teaching effec - 
tiveness rests with its agreement with the com- 
posite judgments of the individuals who are con- 
cerned about the problem. 

Pittenger (50) states that logically there are 
three bases for estimating teaching success or 
ability. It is possible to judge effectiveness by 
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the results produced, by the processes employed 
in teaching, or by the equipment that the teach- 
er possesses for teaching. The last two items 
will be considered along with rating scales while 
the first will be considered with pupil achieve- 
ment. 








Rating Scales as Criteria 


It seems inevitable that teachers will be eval- 
uated with or without the aid of a measuring de- 
vice or instrument. If such evaluations can be 
made to agree with the judgments of society to 
a higher degree of consistency through the use 
of some instrument, then the use of such a tool 
is valid for that purpose. A number of rating 
scales have been developed with that objective 
in mind. 

The rating scales that have been widely used 
in the field of teacher evaluation may be separ- 
ated into two broad categories: (1) general rat- 
ings which are all-inclusive regarding the mer- 
it of the teacher; and (2) specialized scales 
which consider traits and qualities or note spec- 
ific behaviors which are believed to reflect the 
teaching ability of the teacher. 

The first of these types is criticized in that 
it is a subjective personal evaluation which may 
fail to give proper consideration and weighting 
to all the important factors contributing to teach- 
ing success or failure. A rating or ranking is 
given which states the position held by the tea- 
cher with reference to other teachers accord- 
ing to the standards of the rater. 

The second is often criticized with respect 
to the method of selecting and weighting the fac- 
tors used. Experts in the field often disagree 
as to which traits, behaviors, and qualities are 
essential to teaching success and the extent to 
which each contributes. Such a scale does serve 
to emphasize those traits, qualities and behav- 
iors which the scale producers have selected as 
the attributes of good teaching. When a scale 
of this type is used with a number of raters, it 
may be considered as the definition of a good 
teacher. A number of subjective judgments 
have been substituted for the grand comprehen- 
sive evaluation of the first type. There isa 
strong tendency for the rater who hasa strong 
conviction regarding the overall merit of the 
ratee to rate him accordingly on all items of the 
point scale. This tendency is knownas the “‘halo 
effect. ’’ 

The semantic problems in scale construction 
are great. The use of poorly defined terms tend 
to reduce the inter-rater objectivity of rating 
scales. The words employed often have differ- 
ent meanings for different raters. Ambiguity 
must be avoided. Detailed definitions of terms 
and personal conferences among raters might 
be very helpful! in this respect. 
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One of the outstanding weaknesses of rating 
scale procedures is the tendency of the rater to 
over-rate the ratee. An excellent example of 
this is the in-service follow-up by Goetsch (28) 
of 237 Iowa State Teachers College graduates by 
means of a rating scale incorporating 20 char- 
acteristics. He found that the teachers were 
rated as follows: superior, 30 percent; aver- 
age, 15 percent; above average, 52 percent; and 
below average 3 percent. While the writer has 
only the highest regard for the training institu- 
tion involved, he does seriously question that 
only eighteen percent are only average or less. 
Knight (40), in studying this tendency to over- 
rate, found that it increased with the length of 
acquaintance of rater and ratee. Perhaps the 
best explanation of the tendency to over-rate 
lies in the fact that raters substitute their own 
standards for those employed in the rating de- 
vice. Symonds (60:693) expresses this very 
aptly: 


Every supervisor has his own expec- 
tations and unique requirements ina 
teacher. ....It will never be possible 
to find agreement in teacher rating as 
long as principals and supervisors have 
different standards for judging teachers. 


Symonds emphasizes the fact that ratings should 
be made only after considerable time has been 
spent in observation. The impression obtained 
in the first few minutes may be deceiving in that 
the classroom containing an observer is not the 
natural habitat ef the student and his reactions 
differ accordingly. 

The bitterest attacks on rating scales pertain 
to the lack of reliability in these instruments. 
The variations in results that occur may come 
as a consequence of change inthe teacher, change 
in the observer, or in the interpretation of the 
scale. Ratings are certainly affected by the 
length and conditions of observation, the attitudes 
of persons involved, the social and emotional 
status of the pupils, the prejudices of the obser- 
ver, and the nature of the occasion when the last 
rating took place. It is not at all surprising that 
the statistical reliabilities quoted in the litera- 
ture show such large variations. 

Frequent reference is made to a statement by 
Rugg (57) that the reliability of a rating varies 
directly as the square of the number of observ- 
ers rating independently. While this article 
was written some years ago, it still has some- 
thing to consider for those interested in ratings. 
It discusses the requirements and conditions un- 
der which the increase in reliability may be ex- 
pected. The statement should be considered 
only in conjunction with the conditions described 
in the article. 

Schultz (59) warns that effectiveness can not 
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be accurately estimated by a summation of rat- 
ing which is essentially the same as the aver- 
aging process. He makes a point of stressing 
the fact that evaluative processes should in- 
clude the person being rated and should be for 
the purpose of improving teaching. 

In spite of the deficiencies of rating scales, 
they centinue to be used. One needs to remem- 
ber, as Hansen (33) points out, that teachers 
are often hired or fired on the basis of subjec- 
tive information only. The practical world 
insists that decisions be made. Certainly a 
valid evaluation might be exceedingly heipful 
for the guidance, protection, and improvement 
of teachers. 

Baier (3), in discussing officer rating sys- 
tems states: ‘‘In some cases ratings may be 
the best criteria because value judgments are 
essential elements. ’’ The same may be true 
of teacher evaluation. 


Specific Types of Ratings 





Teacher ratings may be made by the teach- 
er himself, by supervisors, by pupils, by peers, 
and cthers. Perhaps the most frequent type of 
rating is that by the supervisor. One of the out- 
Standing advantages of this type of rating is that 
a number of teachers within the system may be 
ranked in order of excellence according to the 
supervisor’s standard. Inasmuch as the super- 
visor has the opportunity to observe the teach- 
er on more than one occasion, the probability 
of that judgment being made on an unusual per- 
formance is very slight. Since supervisors are 
expected to diagnose difficulties and aid teach- 
ers as part of their duties, they may be expect- 
ed in general to be well qualified to judge teach- 
ing competence. At the times when ratings 
must be obtained for areas greater than that 
served by a single supervisor, the ratings are 
subject to additional error arising from differ- 
ences in raters. 

Teachers have also been evaluated by stud- 
ents. The basic assumption is that because of 
his close contact with the teacher the pupil has 
information not possessed by others for render- 
ing sound judgments about the effectiveness and 
ability of the teacher. The use of such rating 
is questioned by many, however, who argue that 
pupils lack the maturity to judge competently. 
Lack of experience with teachers in any quant- 
ity prevents the differentiation of superior and 
mediocre teachers on the lower grade levels. 

It is often further argued that they lack the ex- 
perience and maturity to pass judgment on the 
value of the course contents. Nonetheless, 
Cook and Leeds (21), and Albert (2) state that 
pupil evaluation of teaching is reliable, valid, 
and practical. Bryan (18) urges the use of pu- 
pil evaluation because: (1) the learner’s atti- 
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tude is significant in the learning situation; (2) 
ratings emphasize the need of improvement; (3) 
an increase in concern over teaching problems 
by the staff; (4) pupil ratings are an index of ef- 
fectiveness because teacher influence is propor- 
tional to the respect and admiration of pupil for 
instructor; and (5) trains pupils to be critical 
citizens ina democracy, Boyce and Bryan (15) 
found that pupil opinions tended to be quite stable 
over a five-year period. Some universities (37, 
51) use student ratings for faculty evaluation, 
both for the purpose of improving instruction 
and reemployment. 

Ratings of instruction by colleagues, while 
not as popular as pupil evaluation, has been giv- 
en considerable support. It is argued that byin- 
terclass visitations and through the school activ- 
ities, the peers become sufficiently familiar 
with the work of the other teachers to pass judg- 
ment on the competency of their teaching. It is 
further argued that the process is democratic 
and important in that respect. Evaluation by 
peers presents problems, particularly if no 
means are provided for interclass visitation. 

Many persons have advocated a system of self- 
evaluation, usually in order to improve in ser- 
vice. It would seem that the teacher is inthe best 
position to judge the quality of work which he is 
doing. A suitable standard for comparison 
is missing in most cases. Seldom, if ever, can 
a person evaluate his own work objectively. 

It can be readily seen that we have a number 
of criteria available. Each of these criteria has 
at least one difficulty associated with it as well 
as some advantages. Cureton (in 43:639) states 
the case for rating scales very well: 


When well constructed and rightly used, 
behavior check lists and rating scales help 
materially in making the post recording 
and judgments of different observers more 
comparable, in reducing differences in the 
interpretation of the basic definitions and 
agreements, and in minimizing the dis- 
turbing effects of selective attentionand 
selective recall. 


Pupil Achievement as a Criterion 





Teacher evaluation experts are almost uni- 
versally agreed that the measure of true effec- 
tiveness as a teacher is the change that is pro- 
duced in the pupils taught by that teacher. While 
one may agree that pupil gain is the only valid 
criterion of teaching effectiveness, validity im- 
plies that such a criterion also be reliable as 
well as relevant. Reliabilities reported in pu- 
pil gain studies have been consistently low. 

Many school people use the criterion of pu- 
pil status to evaluate instruction. Some school 
districts require that pupils take standardized 








tests at the completion of the school year. Cer- 
tain states have statewide testing programs and 
pupil promotion is based on the score produced 
on that test. The teacher is evaluated on the 
basis of whether her pupils pass or fail. While 
this practice of teacher evaluation is decreas- 
ing, it is not extinct. 

Such a practice is unfair in that it does not 
take into consideration the ability of the pupils 
to learn nor the conditions under which the learn- 
ing takes place. An improvement was made 
when schools came to use the achievement quo- 
tient which considered the intelligence of the 
pupil. Using the formula score equal to the gen- 
eral average of the class plus one-hundred min- 
us the average I.Q. of the class, Brooks (17) 
rated the teachers in his district. He says, 


In addition to a substantial general 
raise in salary throughout the district, 
the school boards were persuaded to 
grant special increases of one or two 
dollars per week to teachers who rated 
ninety-five percent or better. 


With the passage of time these procedures fell 
into disrepute with both curriculum experts and 
Statisticians. 

The modern educationist claims that schools 
must teach more than subject matter and that 
attitudes, appreciations, ideals, and personal- 
ity improvement are equally important. 

Coy (23) in 1930 warned that the accomplish- 
ment quotient could be used to evaluate teachers 
only under special conditions. Modern educa- 
tors realize that many other factors besides in- 
telligence affect pupil grades and have used re- 
gression equations and similar techniques inan 
attempt to predict the scores that pupils should 
obtain if instruction were equal. The discrep- 
ancies between the predicted scores and the ac- 
tual scores have then been used as a measure 
of the effectiveness of the teaching. 

Almost any book on measurement (54, 43) will 
discuss the measurement errors to which our 
test procedures are subject. The test scores 
of a pupil simply indicate that, in all probabil- 
ity, his ‘‘true’’ score is within certain limits 
of the score he produced on the test. Also, pu- 
pil accomplishment is affected by many factors 
which we are unable to measure satisfactorily 
at the present time and it is questionable if we 
ever will. Many agencies other than the teach- 
ers affect learning. Parents, books, playmates, 
television, radio, and many other forces moti- 
vate and direct learning in present day society. 

A common danger which arises out of the 
testing of subject-matter is the tendency of 
teachers to prepare for the test by urging the 
student to cram and memorize test items 
rather than emphasize understanding and broad 


(Vol. XXII 




















September, 1954) 


concepts. 

A great deal of emphasis has been placed on 
pupil gain as a measure of teaching effective- 
ness. Thorndike (in 43) points out that difference 
scores have a low reliability, often approaching 
zero. Thus, they may have little or no validity 
as measures of teaching effectiveness. 

Many tests are so constructed thata large 
gain is quite easily obtainable if the pupil has a 
low pre-test score, but gain may be restricted 
if the original score was high. This is espec- 
ially true if the test has a low ‘‘ceiling’’. Thus, 
if pupil gain is considered as the difference be- 
tween the pre-test and the final, the pupil who 
had a high score on the pre-test is under a con- 
siderable handicap. Special statisticaland graph- 
ical techniques have been used to compensate 
for this handicap. In most instances they have 
not proved to be entirely satisfactory. 


Summary 


A study of the criteria used for the evaluation 
of teaching effe: ::veness leads one to appreciate 
the difficulties present in all phases of such re- 
search. The importance of evaluating teachers 
has forced educational leaders to use such meth- 
ods as exist since new methods have not been de- 
veloped. The great questions facing education 
today in this area are the following: 


1. Can these criteria be used singularly and 
independently or must composite measures be 
formed? 

2. If composite measures are formed, what 
weighting must be given each component? 

3. Can other measuring devices of greater 
reliability and validity be developed? 


SECTION III 
THE METHOD OF THE INVESTIGATION 


The Design of the Investigation 





DURING THE fall semester of the 1950- 
1951 school year, 101 graduates of the College 
of Education, University of Wisconsin, who hold 
the University Teachers Certificate were visited 
while teaching at the school where they were em- 
ployed. These visits were part of a followup of 
its graduates by the School of Education. The 
purpose of the visitation program was two-fold. 
In the first place, an attempt was made to gain 
information concerning the problems of these be- 
ginning teachers and given them any help possi- 
ble. In the second place, an evaluation oftheir 
teaching was made in order to check the results 
of the University program of teacher training. 

The success of these new teachers during 
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their first semester was determined by several 
methods. A team of two visiting supervisors 
rated them ona short form of the Wisconsin 
Adaptation of the M-Blank during their trip to 
the school. At the same time they secured an 
‘‘acceptability rating’’ from the principal by 
means of an interview. The ‘‘acceptability cri- 
terion’’ was broader than the usual evaluation 
in that it was concerned with the teacher as a 
member of both the school and the community. 
Another measure of the beginning teacher’s ef- 
fectiveness was obtained on a ‘‘participator’s 
rating scale’’. 

After considering the criteria employed in 
the 1950-1951 studies, the writer felt there was 
a need to see if the criteria, used were compre- 
hensive enough to measure teaching effective- 
ness adequately. This is not intendedto be a 
criticism of the previous study which was very 
well done by thoroughly competent and sincere 
individuals. The previous investigations served 
to stimulate this writer to search the literature 
for a definite relationship among the criteria 
commonly employed in evaluating teaching ef- 
fectiveness. Specifically stated, the problem 
settled upon was to discover what relationships, 
if any, that exist among such criteria as prin- 
cipal’s ratings, pupil ratings, peer ratings, 
self rating, and criteria based onpupilachieve- 


ment. 
The writer worked with the Teacher Person- 


nel Research Committee, University of Wiscon- 
sin. The Committee gathered considerable data 
about Wisconsin graduates in the field during 
1951-1952. 

One descision of the committee was to set 
up a testing program for the pupils taught by 
the teachers to measure their attainments in 
the subject-matter areas. The original plan 
called for a measurement of achievement inthe 
area of concommitant learnings as wellas foun- 
dational learnings. After some discussion the 
measurement of concommitant learnings was 
omitted by the committee for two reasons. They 
felt that such items as appreciations, attitudes, 
interests, and personality are subject to large 
measurement errors. Grave doubts were ex- 
pressed as to our ability to measure gains in 
these areas. Furthermore, since the pupils 
taught by these teachers are in high school and 
subject to the influences of a number of teach- 
ers as well as classmates and others, it was 
decided that if gains were detected in these ar- 
eas, there would be no real justification for 
ascribing these gains to the teacher being stud- 
ied. Since the subject matter areas tend to be 
more specialized, such changes might reason- 
ably be expected to be the effect produced by 
the teacher in that area. It was recognized that 
some changes may result from maturation. The 
comparison is, however, of teachers whose 
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Classes contain a number of pupils, where in it 
seems reasonable to assume that the mean of 
the gains resulting from maturation will be ap- 
proximately the same for the various schools. 
The addition of a constant to the mean gain 
score produced by the teacher is immaterial 
because the purpose is to rank the teacher rather 
than determine his absolute effectiveness. 

In consideration of the fact that factors other 
than the teachers influence learning, measures 
were obtained of the status of the pupils with re- 
spect to intelligence, age, and average scholas- 
tic standing in order to determine how much of 
the learning could be attributed to these factors, 
assuming that the effects of all other factors are 
negligible. After the relationships of the select- 
ed factors to pupil change had been determined, 
regression equations were set up for predicting 
the gain to be expected. The mean discrepancy 
for each class between the actual scores and the 
predicted scores may then be considered a meas- 
ure of teaching effectiveness. Such a measure is 
sound only if the variations that are due to differ- 
ences in the groups of those variables which af- 
fect learning have been removed. 

Since the scores of pupils on achievement ex- 
aminations are occasionally regarded as a cri- 
terion of teacher efficiency, it was decided to 
use this as one of the criteria to be studied. Ac- 
cordingly, a score was predicted for each pupil 
by means of a regression equation. The mean 
difference for each class between predicted 
scores and attained scores were used as a meas- 
ure of teaching effectiveness. When certain un- 
avoidable delays made it apparent that about 
three months of school would elapse before the 
initial testing, it was decided to use both initial 
and final test discrepancies as measures of tea- 
ching ability. It was felt that if a teacher could 
be found who would place at the top in all three 
of these pupil achievement evaluations, that 
teacher could be considered as an excellent one. 

The ratings of the teachers by themselves, 
their peers, their pupils, the principal, and an- 
other agency were collected through correspond- 
ence. The intercorrelations among these criter- 
ia were computed and will be presented later. 


Representativeness of the Group Selected 





It would be difficult to set up pupil achieve- 
ment criteria for all the teachers in the group 
that were visited because there was great diver- 
sity in the classes taught. Accordingly, the de- 
cision was reached to select a group of 30 to 35 
teachers whose classes would be most amenable 
to the testing program. 

Since the study is concerned with criteria of 
the teaching effectiveness of Wisconsin gradu- 
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ates, it was decided to select the group to be 
studied from the group visited last year. Of 
that group only 64 were teaching in Wisconsin 
high schools during the current year. The re- 
mainder had left the teaching field for the armed 
services, marriage, more lucrative positions, 
or other fields that seemed to be more compat- 
ible with their interests and abilities. The group 
under consideration was augmented by selection 
of other 1950 graduates of the University of Wis- 
consin teaching in the state high schools. After 
a preliminary screening a questionnaire form 
(Appendix Ila)* was sent to each teacher on the 
list of eligibles. Approximately 89 percent of 
the group returned the questionnaire. 

The 73 returned questionnaires were exam- 
ined with respect to courses, length of class 
periods, class size, grade level, textbook used, 
and willingness of the teacher to cooperate. On 
the basis of this inspection, 32 teachers were 
selected for participation in the study. Later 
two teachers were dropped from the study. In 
one case the teacher changed positions during 
the school year. The other was found to be 
teaching in a position which differs greatly from 
the circumstances under which most of the Uni- 
versity of Wisconsin graduates teach. 

Almost every attempt to show that a sample 
is representative of the total population from 
which it is drawn is predestined to failure. The 
investigator may stratify the population with re- 
spect to as many variables as he may desire 
and ascertain that his sample is representative 
of this larger population insofar as the charac- 
teristics considered are concerned. However, 
since the number of variables which may be used 
to compare groups is almost infinite, the inves- 
tigator must draw the line at some pointandas- 
sume that further comparisons are unnecessary 
and irrelevant. 

The comparisons are irrelevant when the 
basic criterion for comparison does not mater- 
ially affect the trait which is under investigation. 
Since this study is concerned with the relation- 
ships of various criteria of teaching effective- 
ness, and no information is available at the pres- 
ent time as to the way in which factors affect 
teaching effectiveness as measured by all these 
criteria, there is no characteristic which could 
not be assumed to be irrelevant. However, since 
it is reasonable to suppose that there are some 
factors which do affect teaching effectiveness, 
an attempt is made to show the group represent- 
ative of graduates of the University of Wiscon- 
sin. 

The writer is aware that there is some risk 
involved in making inference from this sample 
to the larger population. However, he feels 
that to restrict any conclusions which might be 
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drawn to the group studied alone is pointless. It 
is pointed out that inferences drawn in sampling 
studies may also be in error, but possess the 
advantage that the investigator is in position to 
state the probability of being in error. 

It may be profitable to see how these select- 
ed teachers compare in some respects ‘vith an- 
other group of 1950 College of Education gradu- 
ates. Mr. Harley E. Erickson of Superior 
State College, Superior, Wisconsin, was con- 
currently conducting a factorial investigation of 
the teaching abilities of the 64 teachers frequent- 
ly referred to in this section. TableI gives a 
brief comparison of the two groups. 

The small differences which exist in mean 
percentile rank on the Psychological Examina- 
tion and in high school class between the two 
groups may possibly be explained inthat our 
group is composed of teachers in the academic 
courses and does not include fields of special 
abilities such as fine arts, home economics, in- 
dustrial education, and music. The sex differ- 
ence may possibly have risen from the same 
cause. 

In Table II the group is compared with groups 
of the 1948 and 1949 graduating classes of the 
College of Education, University of Wisconsin, 
on basis of means and standard deviations in 
general grade point average. It would appear 
that the group studied herein is slightly super- 
ior to the other groups in this respect. This dif- 
ference may possibly be the result of selective 
processes such as the restriction of the group 
to teachers of academic subjects and that by the 
second year a number of people have left the 
teaching profession. 

The teachers studied in this investigation may 
be considered as representative of the population 
of Wisconsin College of Education graduates 
teaching in this state. They have met the re- 
quirements for entrance to the University and 
to the College of Education. While enrolled in 
the College of Education they have all passed 
through approximately the same training exper- 
ience and indoctrination. Upon graduation they 
have secured positions in the state high schools. 
In these and other respects they have certain pat- 
terns of homogeneity that are inescapable. While 
there are many aspects in which the parent pop- 
ulation is heterogeneous this is also true of the 
Our Group. 

Insofar as can be determined at the present 
time, the group does not differ radically from 
the parent population. Small differences-do ex- 
ist in the following: (1) the courses taughi; (2) 
slight differences in mean and standard devia- 
tions of grade point averages, percentile rank 
in intelligence, and percentile rank in high 
school class, (3) expected differences in indi- 
viduals. Since the same selective processes 
might be expected to operate in the future as 
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have in the past, one might well expect this 
group to be truly representative of teachers ob- 
taining the University Teachers Certificate and 
employment in the State of Wisconsin. Infer- 
ences drawn from this sample can be applied 

to the general population if one wishes to make 
the assumptions noted above. 


Data-Gathering Devices Employed 





The measuring devices used in this study 
may be considered in two groups: (1) those ap- 
plied to the teacher, and (2) those applied to pu- 
pils taught by the teacher. The measures ap- 
plied to the teachers are rating scales of var- 
ious types. Three of these scales are adapted 
from Student Reaction to Instruction Question- 
naire developed my Roy C. Bryan of the camp- 
us school at Western Michigan College of Edu- 
cation, Kalamazoo, Michigan. The measures 
applied to students are achievement tests in 
subject matter areas. The tests used inthis 
study were selected after securing the opinions 
of experts in these particular subject matter 
areas, by reference to Buros (66) and by inspec- 
tion of copies of the tests. No test was used if 
its reliability did not exceed 0.90. A form (Ap- 
pendix II. B) was sent to each school to secure 
information about the age, intelligence, and av- 
erage scholastic standing for each pupil in the 
classes where the achievement tests were given. 

The teacher rating scales used are listed: 


. A-short form of the Wisconsin Adaptation of 
the M-Blank. Originally M-Blank, Data for 
Individual Staff Members, 1940 edition, Co- 
operative Study of Secondary School Stand- 
ards. (Appendix II. C) 


. A pupil rating form—adapted from Bryan. 
(Appendix II. F) 


. A teacher self rating form—adapted from 
Bryan. (Appendix II. E) 


. A peer rating form—adapted from Bryan. 
(Appendix II. D) 


. Administrative X Criterion (confidential, i.e., 
not open for inspection). 


Pupil Achievement Tests: 


1. Typing—Competent Typist Test 
Gregg Awards Department, 330 West 42nd 
St., New York 36, N. Y. (Appendix III. A) 


. Mathematics 7—Colorado State College of 
Education Basic Skills in Arithmetic Test, 
Form A, Grades 6-12 1945 Chicago Sci- 
ence Research Associates. (Appendix III. B) 
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TABLE I 


COMPARISON OF VISITED GROUP AND OUR GROUP 





Characteristic Our Group Visited Group 





Number 30 64 
Sex 

Mean Grade Point 

Mean Percentile, 

American Council, 


Psychological Examination 


Mean Percentile in 
High School 





TABLE 


COMPARISON OF GROUP WITH 1948 AND 1949 GROUPS 
ON SCHOLARSHIP 





Characteristic Sample 








Number 29 
Mean 


Standard Deviation 





TABLE Ii 


MEANS AND STANDARD DEVIATIONS OF VARIABLES (COURSE 1) 
(N = 113) 





Age P.R. S.A. P.T. F. 





15. 15 39.59 6.27 13. 30 17.34 


0.613 26. 01 2.32 10. 69 12. 52 
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. English—Smith, D. V. and McCullough, C. 
M. Essentials of English Test Form A, 1939 
Minneapolis Educational Test Bureau. (Ap- 
pendix III. C) 


. General Science—Educational Testing Ser- 
vice Cooperative Science Test, Form Y 
1948 New York Educational Testing Service. 
(Appendix III. D) 


. Algebra—American Council on Education, 
Cooperative Algebra Test Form 8 1942 New 
York Cooperative Test Service. (Appendix 
Ill. E) 


. World History—American Council on Educa- 
tion, Cooperative World History Test Form 
Y, 1948 New York Cooperative Test Ser- 
vice. (Appendix III. F) 


Sources and Collection of the Data 





The methods employed in collecting the data 
for various segments varied considerably. It is 
believed that this exposition of the sources and 
methods of collecting of data will be presented 
most lucidly if the data relative to each criter- 
ion are presented separately. 

The rating of the teacher by his principal was 


secured on the one page form of the Wisconsin 
Adaptation of the M-Blank. A sample of the blank 
and the letter sent with it to the supervising 
principal are in Appendix 1.C. The principal 
is expected to list what he considers to be the 
outstanding elements of strength and weakness 
in the work of the teacher and to make any com- 
ments which he deems appropriate. The prin- 
cipal was also asked to give an overall evalua- 
tion. In this particular study the general eval- 
uation only is considered and is recorded in col- 
umn M of Appendix I.B. The ratings run from 

1 for superior to 5 for inferior. Teachers and 
courses are indicated by coded numbers as a pre- 
caution lest the rating of a teacher be used to 
his disadvantage by some unscrupulous individ- 
ual, 

As a result of the fine cooperation of the par- 
ticipating principals, the ratings of the teachers 
were secured from five pupils taught by each 
teacher. In several schools the administration 
felt that pupil ratings were not desirable for 
their system. These wishes were respected 
and the appropriate spaces in the data sheet re- 
main blank. Five forms and five stamped ad- 
dressed envelopes were sent to the principals 
with the request that five pupils be chosen at 
random to rate their teacher. The instructions 
sent out with all rating scales were that the 
rater remain anonymous. The form consists 
of 9 questions on which the pupil is asked to 
rate the teacher. In an attempt to overcome 
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the tendency to overrate, the pupils were asked 
to divide all the teachers they have had into 
five equal groups according to excellence. The 
pupil is then asked to indicate which fifth the 
present teacher fits. In recording the data in 
Appendix I. B. columns, 2 to 6 inclusive, 1 in- 
dicates the best fifth, 2 for the second fifth 
and so on until the worst fifth is 5. 

The peer rating form (Appendix I. D) is very 
similar to the pupil rating, except some sub- 
scales for rating differ. This form was sent to 
two teaching colleagues of the teacher. The re- 
sults are tabulated in Appendix I. B. 

A self rating form was also sent each teach- 
er with instructions to send ina rating of his 
own teaching. The form which was an adapta- 
tion of Bryan’s Pupil Rating Scale is very sim- 
ilar to the peer and pupil scales. (Appendix II.E) 
The same scale of values is used in recording 
the data as in the other two cases. (Appendix 
1. B) 

Since the Administrative X criterion is con- 
fidential no information about it other than 
the method of scoring will be givenhere. The 
largest numerical score that may oe attained 
is five. The higher score indicates the poorer 
teacher. 

There are 7 areas in which pupil achieve- 
ment was studied: English 9, English 11, Typ- 
ing, Mathematics 7, Algebra, General Science, 
and Modern History. A few teachers had clas- 
ses in two areas and were tested in both. Since 
achievement in these subject matter areas was 
to be compared on the basis of standard scores, 
a minimum of five classes in each area was de- 
sired. Several of the course areas were below 
this minimum. Consequently, several test 
groups were set up in several cities using 
teachers who were comparable in training and 
experience. Data for these groups are located 
in the Appendix I. A with letters as code for 
teachers rather than numbers. 

The pupil achievement program was carried 
on with the aid of the school principals. Tests 
were sent out with a letter (Appendix II. B) to 
the principal requesting that the test be admin- 
istered under his supervision. Most of the 
tests were given in December 1951. The tests 
were returned to the Teacher Personnel Re - 
search Committee which scored them and sent 
copies of the scores to the teacher and princi- 
pal involved. In the spring tests were again 
sent to the principals with instructions to retest 
the group 70 or 75 school days after the initial 
testing, depending on the course. So faras 
can be determined, all tests were given on 
schedule except one. That test was given 68 
days after the first testing instead of the pre- 
scribed 70. No adjustment was made for this 
time discrepancy because it might possibly be 
used to identify the teacher. The initial test 
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scores together with the measures of intelligence, 
age, and average scholastic achievement are tab- 
ulated by classes in Appendix I. A. 


The Treatment of the Data 





The intercorrelations of age, intelligence, 
and average scholastic standing were calculated 
for each course. These correlations were tabu- 
lated in a matrix. Using the Aitkin Method of 
Pivotal Condensation (62:89), prediction equa- 
tions were set up for each course to predict the 
pretest scores. The mean of the discrepancies 
between the actual scores and the predicted 
scores was obtained for eachclass. The class 
discrepancy was regarded as a measure of the 
effectiveness of that teacher. A standard score 
for a teacher was calculated by subtracting the 
mean discrepancy for the entire course from 
the mean discrepancy in the class and dividing 
the difference by the standard deviation of the 
mean discrepancies for the course. The teacher 
may be ranked by tnese standard scores if the 
assumption is made that the teachers in one 
course are equally competent as those of another. 
Rather than make this assumption it was decided 
to assume that the norms provided with the tests 
are all equally applicable to their groups. A con- 
stant of 5 was added to each standard score, 
rendering it positive. By multiplying the scores 
by a positive number all values are positive. 

The scores of the different groups should be com- 
parable after they have been multiplied by the 
ratio of the mean score in the group to the mean 
of the norms for that group. These scores may 
now be considered to rank the pupils according 
to the achievements of their pupils after compen- 
sation for differences in age, intelligence, and 
average scholastic standing. An identical pro- 
cedure was used for the final test score except 
that final score was substituted for initial score, 
and that the pretest score was used as a predic- 
tive variable. In both cases there were teachers 
who were teaching classes in two courses tested. 
The two ratings which the teacher achieved were 
averaged in determining their ranking. 

In working with pupil achievement as a meas- 
ure of teaching effectiveness, it is common to 
consider the change produced rather than the 
status. If the initial score is subtracted from 
the final score, the difference is the gain score. 
In this investigation the mean of the gains scored 
on the various segments of the pretest range 
were plotted against the segments of the pretest 
scores for each course. If no appreciable rela- 
tionship was found, the gain scores were left un- 
changed. In some of the cases a trend was found 
for the low pretest scores to be associated with 
the high gains. To compensate for this, each 
gain score was multiplied by the ratio of the 
mean of all the gains to the mean of the gain 
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scores in the segment of pretest scores where 
the corresponding pretest score is located. 

The intercorrelations of the adjusted gains 
with age, intelligence, scholastic achievement, 
and pretest scores were used to determine a 
prediction equation for gains. The mean dis- 
crepancy of actual gain minus predicted gain is 
used as a discrepancy score for each teacher. 
Teachers were rated on these discrepancy scores 
by the same process as on the pretest and final 
score discrepancies. 

In some of the cases it was found that the 
prediction equation accounted for less than 25 
percent of the variance. In these cases the ac- 
tual mean gains were used as measures of the 
teachers’ effectiveness after the mean gains 
had been converted to rankings by the standard 
score techniques mentioned before. 

After ratings and rankings of the teachers 
have been obtained by all criteria, the relation- 
ships among these were determined by the Pear- 
son product moment correlation formula. 


SECTION IV 
THE ANALYSIS OF THE DATA 


THE MAJOR purpose of this investigation 
was to study the relationships among certain 
criteria of teaching effectiveness. Data regard- 
ing the success of thirty graduates of the Col- 
lege of Education, University of Wisconsin, in 
the second year of service were collected. The 
types of data collected with a short explanation 
of the symbols employed and the scoring system 
of each will be reviewed. The original dataare 
presented in Appendix I. 

1. A rating of teacher by principal on a one 
page form of the Wisconsin Adaptation of the 
M-Blank. Symbol M; scoring range of one to 
five with the lower score indicative of the better 
teacher. 

2. A rating of the teacher by pupils on an 
adaptation of the Bryan Reaction to Instruction 
Questionnaire. Symbol Pu; scoring same as 
above for M. Total of five pupil ratings used 
in calculations. 

3. A rating of the teacher by himself ona 
specially adapted self evaluation form. Symbol 
Se; scoring same as M above. 

4. A rating of the\ teacher by his colleagues 
on the peer-evaluation form. Symbol Pe; scor- 
ing as M above. Sum of two peer ratings used 
in calculations. 

5. A rating of teacher by another agency on 
a special Administrative Criterion. Symbol 
AX; scoring range of five upward. Lower score 
indicates better teacher. 

6. Pupil scores on an initial administration 
of standardized achievement tests. Courses 
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TABLE IV 
INTERCORRELATIONS OF VARIABLES (COURSE I) 


























Age P.R. S.A. P. T. F. G. 
Age 
P.R. - .272 
S.A. 0.270 - .585 
P. T. - .138 0.597 - ,550 
F. - .223 0. 664 - .555 0. 848 
G. 0. 349 0.291 - .169 - .136 0. 520 
TABLE V 
PREDICTION EQUATIONS FOR COURSE I 
P.T.' = 0.17 PLR. -1.40 S.A. +15. 3 
F' = 0.11 P.R. - .248.A. + 0.81 P.T. + 3.8 
G' = 5.00 Age +0.11 P.R. -76.0 
| 
TABLE VI 
| MEANS AND STANDARD aie VARIABLES (COURSE II) 
Item Age P.R. S.A. PF. F. G. 
Mean 15.24 51.75 5.29 9. 23 15.37 6.15 


Ss. S. 0. 875 25. 62 2.36 4.75 4.97 4. 09 




















52 JOURNAL OF EXPERIMENTAL EDUCATION (Vol. XXIII 


TABLE VII 


INTERCORRELATION OF VARIABLES (COURSE 1) 

















Age P.R S.A. P.T F G 
— fee ae 
P.R a: oe 
S.A - .009 ae ‘svenes 
ey - .342 0.381 388 8696S 
F. - .1¢@ 0.369 - .372 S| arse ry. 
A.G 0. 056 0.174 - .205 0. 003 a rere e 
TABLE VIII 
1 
PREDICTION EQUATIONS FOR COURSE II 
oe ae = -1.70 Age +0. 054 P.R. - .332 S.A. -34.2 
Fe = 0.019P.R. -0.415 S.A. +0. 548 P. T. +11. 50 


A.G.' not used; cannot account for 10% of variance. 











TABLE IX 
MEANS AND STANDARD DEVIATIONS OF VARIABLES (COURSE II) 
(N= 99) 
Age I.Q. S.A. te F. A.G. 
Mean 12. 40 105. 64 5. 75 46.13 54. 54 8.555 


S.D. 0. 62 12.90 2. 67 10. 63 10. 12 4.58 
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TABLE X 


INTERCORRELATIONS OF VARIABLES (COURSE II) 














Age 1.Q. S.A. P. T. F. A.G. 
— © saraer 
1.Q. a re Tt 
S.A. 0. 361 = ee” cbecse 
P. T. - .107 0. 481 = vee  ‘“ssecan 
F. - .238 0. 494 - .704 re eke 
A.G. - .074 0.128 - .004 - .074 ae ree 
TABLE XI 
PREDICTION EQUATIONS FOR COURSE III 

ae = 1.732 Age -3.288 S.A. +43. 562 

F,' = 0.085 1.Q. +0. 095 S.A. + 0.797 P. T. +8. 279 

A 

A.G.' not computed; can only account for about four percent of variance. 














TABLE XI 
MEANS AND STANDARD DEVIATIONS OF VARIABLES (COURSE IV) 
(N = 95) 
Age P.R. S.A. P.T. F. G. 
Means 15. 02 65. 65 5.37 14. 50 21. 63 7,13 
S.D. 0.951 24.27 2.59 7.53 10. 23 6.18 
TABLE XII 


INTERCORRELATIONS OF VARIABLES (COURSE IV) 








Age P.R. S.A. Be F. G. 
re ee 
P.R. “(ge Se wees 
S.A. 0. 044 eGR. cd ectee 
P.T. - .122 0. 609 eee O19 
F. - .011 0.557 - . 667 OGGe: © vice 
G. 0. 130 0. 180 - .366 0. 105 4 Be seer 











"4%. 
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are coded by number. High score indicates at- 
tainment of desired objectives. Symbol for in- 
itial test score is P. T. 

7. Final scores on certain standardized tests. 
Same as above with F to signify final. Scored 
as number 6. 

8. Pupil Age measured in completed integral 
units. Symbol A. 

9. Pupil rank in intelligence. Symbols usec 
are P.R. for percentile ranks on statewide test- 
ing program involving Henmon-Nelson Test of 
Mental Ability and I.@. for intelligence quotient 
on Otis form A. 

10. Average scholastic standing in high 
school. Symbol is S.A. ; scored on scale of 1 
to 12, with 1 corresponding to A, 2 to A-, 3 to 
ee 12 to F. 

From items 6 through 10 based upon test, 
grades, and school records it was hoped that 
three criteria of teaching success could be de-° 
veloped. The test criteria are based ondiscrep- 
ancies between actual and expected scores on 
the pre-test, final and gains. These three were 
to be compared separately with the rating of it- 
enis 1 through 5 by usual correlational tech- 
niques. Since the development of criteria of 
teaching effectiveness based on pupil achieve- 
ment involves considerable detail and computa- 
tion, priority is given to that treatment. 

The means and standard deviations of the 
ages, percentile ranks in intelligence or intelli- 
gence quotients, average scholastic standings, 
initial and final test scores on the achievement 
examinations, and gains or adjusted gains are 
listed in Tables III, VI, IX, XII, XV, XVIII, and 
XXI. The number of cases on which each of 
these statistics was computed is given in each 
table. 

The intercorrelations between these variables 
are given for each course in Tables IV, VII, X, 
XII, XVI, XIX, and XXII. For some of the var- 
iables in Table XXII, two sets of correlations 
are given. One of the participating classes was 
tested only one time. The upper correlation in 
each cell is based on 103 cases while the lower 
used only 83. The correlations for the 103 were 
used in preparing the prediction equations for the 
initial scores (P. T.') for course VII. The final 
and gain predictions for that course are based 
on the 83 cases. 

Beta coefficients for prediction equations in 
standard score form were computed from the in- 
tercorrelations of each course group by the Aitkin 
Method of Pivotal Condensations (62:89-95). The 
standard score equations were subsequently con- 
verted to the raw score form of the regression 
equation by replacing the means and standard de- 
viations in the definition of standard score by 
the computed values given in the tables referred 
to above. Prediction equations in raw score 
form are listed in Tables V, VIII, XI, XIV, XVI, 
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XX, and XXIII. 

At the time that the beta coefficients were 
computed, multiple correlations were also cal- 
culated. If one adds variables to an equation 
of this type, one expects the multiple correla- 
tion to increase. If it does not, the added var- 
iable has not improved the prediction that can 
be made by the use of the regression equation. 
In this computation, if the multiple correlation 
was not raised by an increment of 0. 02 on the 
addition of a variable, the variable was omitted 
from the equation. Consequently, some equa- 
tions have only two independent variables while 
others have three or four. If this general pro- 
cedure failed to raise the multiple correlations 
above 0.3, the regression equation was not set 
up in raw score form because it fails to take 
account of sufficient variance to make the oper- 
ation worth the effort involved. 

As a result of the consideration mentioned 
in the preceding paragraph, only three predic- 
tion equations were set up for predicting gains. 
Later it was decided to use only such equations 
whose multiple correlations exceeded 0.5. This 
eliminated al! but one gain predictor. Since this 
one did not differ materially from 0.5 it was de- 
cided to omit it also. All the multiple correla- 
tions for the various prediction equations are 
given in Table XXIV. 

Although it was not feasible to measure tea- 
ching effectiveness on the discrepancy between 
the actual gain and the predicted gain score, it 
was felt that it would be justifiable to base judg- 
ment on the gains alone, inasmuch as gains are 
the outcomes that are desired. Evidently none 
of the measures used correlated highly enough 
with gain to possess predictive value. Since we 
do not have the necessary means for predicting 
gains, it is assumed that gains are equally likely 
to appear in each class if we disregard the in- 
fluence of the teacher. Accordingly the assess- 
ment of teaching effectiveness in terms ofactual 
gains or gains adjusted for pretest handicap was 
substituted for the proposed criterion of the dis- 
crepancy between actual and predicted mean 
gains. 

The mean discrepancies for the different 
classes between the pretest and predicted pre- 
test score are summarized in Table XXV. The - 
same type of information concerning the final 
score discrepancies is located in Table XX VI. 
The letter D is used to indicate the discrepancy 
while the subscripts p and f refer to the pretest 
and final respectively. 

When the means of the gains for each each 
pretest score were graphed versus the pretest 
scores, it became apparent that definite rela- 
tionships existed between pretest and gain for 
Courses II, I, VI, and VII. In all of these 
courses high gains tended to be associated with 
low pretest scores. Accordingly, the baseline 
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TABLE XIV 
PREDICTION EQUATIONS FOR COURSE IV 

















P.T.! = 0.125P.R. -1.15S.A. +12. 47 
- iF! = 0.018P.R. -1.10S.A. + 0.82 P.T. +14. 44 

G.' = -.004P.R. - .89S.A. +12.14 

TABLE XV 
MEANS AND STANDARD DEVIATIONS OF VARIABLES (COURSE V) 
(N = 105) 
Age P.R. S.A. P.T. F. G. 

Means 14. 39 55. 92 5. 43 22. 49 29. 09 6. 60 

S.D. 0. 67 26. 02 2. 45 9.14 12.53 7. 95 
TABLE XVI 


INTERCORRELATIONS OF VARIABLES (COURSE V) 














Age P.R S.A P.T F G 
| aes 
P.R. “ae sesves 
S.A 0. 356 ~.Ge «= cetess 
P.T - .093 0. 424 an asebe 
F. - .144 0.521 - .514 QTee <evcer 
G. - .119 0. 333 - .256 0. 070 re 
TABLE XVII 
PREDICTION EQUATIONS FOR COURSE V 

».T.° = 0.079 P.R. ~1.317S.A. +25.24 

F.' = 0.093 P.R. - .475 S.A. + .887 P.T. +6.51 

G.' = 0.085 P.R. - .32458.A. + 3.63 
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TABLE XVIII 
MEANS AND STANDARD DEVIATIONS OF VARIABLES (COURSE VI) 
(N = 147) 
Age P.R. S.A. P. T. F. G. 
Means 13. 95 49.84 5. 85 87. 48 94.61 7. 04 
S.D. 0.54 25. 85 2. 48 15. 00 12.98 12.24 
TABLE XIX 


INTERCORRELATIONS OF VARIABLES (COURSE VI) 














Age P.R S.A. Py. F F G 
— = =—t=—ték wwe’ 
P.R eee 
S.A 0.178 > 2. seas 
J y - .1§2 0.534 _ ——reeer 
F. - .208 0.549 - .565 + ee 
A.G - .112 0. 146 - .103 - .041 ae. aneces 
TABLE XxX 
PREDICTION EQUATIONS FOR COURSE VI 
ye" = 0.200 P.R. -2.415S.A. +91. 58 
7. * = 0.072 P.R. - .138.A. + .550 P.T. +47, 24 


A.G.' not used; cannot account for 10 percent of variance 
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TABLE XXI 


MEANS AND STANDARD DEVIATIONS OF VARIABLES (COURSE VII) 




















Age P.R. S.A. P.T. F. A.G. 
Mean 16.37 50. 00 5. 69 93.50 101. 02 7.34 
S.D. 0.74 28. 85 2.29 18. 43 16. 78 14. 71 
N 83 83 83 83 83 83 
Mean 16. 36 48. 86 5. 71 94.71 
S.D. 0. 74 29.34 2.30 18. 72 
N 103 103 103 103 
TABLE XXII 
INTERCORRELATIONS OF VARIABLES (COURSE VII) 
Age* P.R.* S.A.* P.T. F. A.G. 
eens 
P.R. * a swieee 
. 340 
S.A.* 0. 286 > {ee OF aeccen 
0.326 - .520 
P. T.* . 281 0. 490 = OB = sevees 
. 280 0. 484 - .518 
F. . 408 0. 607 - .595 GG. vevtes 
A.G. . 060 0.191 - .098 - .063 Give: ~— steses 





*Two sets of correlations given; upper number based on 103 cases, lower set 
based on 83 cases. 
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TABLE XXIII 


PREDICTION EQUATIONS FOR COURSE VII 





». T.° 177 P.R. -3.190 S.A. +104. 260 
y.° 2.743 Age +0. 105 P. R. - .199S.A. +0. 620 P. T. +87. 260 


A.G. not used; cannot account for ten percent of variance. 





TABLE XXIV 


CALCULATED MULTIPLE CORRELATIONS FOR 
REGRESSION EQUATIONS 





P. i. 


. 64 
-51 
. 80 
. 70 
.51 
. 64 
0.59 








*Indicates multiple correlations less than 0.3 
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TABLE XXV 


MEAN DISCREPANCIES BETWEEN PRETEST FOR EACH CLASS 
AND PREDICTED PRETEST 





Dp 


= 68 
. 93 
. 34 
. 66 
. 94 
. 26 
. 00 
. 23 
.50 
. 98 
. 07 
. 85 
.97 
. 23 
. 74 
. 55 
. 08 
.10 
. 44 
.27 
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TABLE XXVI 
MEAN DISCREPANCY BETWEEN FINAL AND PREDICTED FINAL FOR EACH CLASS 





Course Teacher Ds Course Teacher Dr 





I 27 +. 17 +0. 73 
A 1. 19 +1.91 
B +1, +0. 67 
28 +2. -6. 00 
29a -2. -0. 71 
29b -1. +1. 91 
20 -0. +3, 75 
22 +0, -2.20 
23 +0. +2. 65 
24 -0. +2. 32 
25 +0. -2.54 
26 -0. +4, 54 
31 -0. -1. 67 
Cc -0. -3.59 
32 +0. +0. 74 
D +0. +1. 03 
E +1. +1. 44 
18 -1. -2.10 


F 


— dw 
QaQoorrK UOre WwW -~I D 
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TABLE XXVIII 


MEAN GAINS OR ADJUSTED MEAN GAINS FOR EACH CLASS 


(Vol. XXIII 








Course Teacher G. Course Teacher G. 
I 27 6. 40 17 7. 40 
A 3. 00 19 9.52 

B 5. 87 16 8.55 

28 8.75 V 14 0.28 

29a 0. 00 16 5.52 

29b 2.53 12 8.59 

Il 20 6. 40 15 11.21 
22 6. 62 G 5.90 

23 7.52 VI 4 10. 73 

24 5. 98 7 9.37 

25 5. 89 3 §.97 

26 4.77 1 12. 33 

I 31 8, 49 8 4.50 
Cc 6. 30 5 1.79 

32 9.70 vu 1 4.87 

D 8.39 21 10. 77 

E 9.92 10 6. 72 

IV 18 4.34 9 6. 07 
F 5. 81 6 9.89 





TABLE XXVIII 


MEANS AND STANDARD DEVIATIONS OF MEAN DISCREPANCIES AND GAINS 
FOR EACH COURSE 








Discrepancies Discrepancies Mean Class 
on Pretest on Final Gains 
Course Mean S.D. Mean S. D. Mean S.D. 
I - .23 0.99 - .02 1. 81 4. 43 2. 88 
II 0.13 1.12 0.04 0. 62 6.20 0. 83 
Ill - .29 2.52 0. 03 0. 73 8.56 1.29 
IV - .31 2.33 0. 02 1. 41 7.12 1. 86 
Vv - .39 2.15 - .65 3.38 6.30 3. 64 
VI - .14 3. 46 0.29 3. 03 7.45 3. 68 
VII 0.75 5. 03 0.28 1,25 7. 66 2.26 
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of the pretest scores (abscissa) was divided in- 
to segments. A mean gain ordinate was calcu- 
lated for each segment. Each gain score in the 
course was multiplied by the ratio of the mean 
gain for the segment of the abscissa which con- 
tains its pretest score. Thus each gain score 
in these four courses was adjusted for inequal- 
ities in difficulty at various levels on the test. 
Table XXVII shows the mean gains for classes 
in Courses I, IV, and V and the adjusted mean 
gains for each class in Courses II, I, VI, and 
VII. 
The discrepancy and mean gain scores serve 
to rank the teachers on these particular criter- 
ia in the various courses. A common device for 
comparing groups of this kind is to reduce the 
scores to standard score form with zero mean 
and unit standard deviation. Any score ina 
group of scores can be converted to standard 
score form by subtracting the mean of the group 
and dividing this difference by the standard 
deviation of the group. The means and stand- 
ard deviations of the discrepancy and mean gains 
of allclasses for all courses are listed in 
Table XXVIII. When these means and standard 
deviations were used to convert the discrepancy 
and gain scores into standard scores, the stand- 
ard scores tabulated in Table XXIX were ob- 
tained for the three criteria of teaching effec- 
tiveness. 

Standard score distributions always contain 
negative as well as positive numbers. Under 
ordinary circumstances the signed numbers are 
perfectly acceptable. However, in this investi- 
gation a weighting correction was made to com- 
pensate for the fact that possibly the more cap- 
able teachers may be teaching in one course while 
mediocre teachers are clustered in another. The 
plan for weighting was to multiply the score for 
each teacher by the ratio of the mean score for 
all pupils in that course to the mean score as re- 
ported for the norms. Such a plan would project 
the lowest score in a group down in proportion 
to the rating of the group on the norms. However, 
if a constant is added which will change all the 
standard scores to positive values, this objec- 
tion is removed. A constant of five was added 
to each standard score. The resulting scores 
are listed in Table XXX. 

Table XXXI shows the means attained by the 
pupils taught by the teachers being evaluated on 
the standardized achievement examinations. The 
means of the norms are furnished for compari- 
son. When the scores of Table XXX were mul- 
tiplied by the ratio of the appropriate means 
from Table XXXI, the scores were changed to 
criteria scores as listed in Table XXXII. 

The intercorrelations of these three criteria 
scores and the five ratings by principal, teach- 
er, pupils, peers, and administrative criterion 
X are summarized in Table XXXII. These cor- 
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relations were computed by the Pearson product 
moment formula. Since low numbers on the 
rating scales indicated superior teaching and 
high numbers superior teaching on the achieve- 
ment criteria, the signs of correlations obtain- 
ed using one criterion from each group have 
been reversed to give the usual interpretation 
to the correlations. 

If samples of size thirty had been randomly 
drawn from an infinite population whose true 
correlation is zero, the standard error would 
be 0.18. Only in about five percent of such 
samples would one expect to find correlations 
whose absolute value is greater than 0.36. In 
this set of correlations based on a selected 
group, only seven correlations are of the mag- 
nitude to cast doubt upon the possibility of their 
having arisen as chance fluctuations from atrue 
correlation of zero. The seven are: 


i. Correlation of M-Blank with Administra- 
tive Criterion X 

. Self rating with principals M blank 

. Peer with pupil 

. Peer with Administrative Criterion X 

. Pretest rating with final rating on pupil 
achievement 

. Pretest with adjusted gain criterion 

. Final with adjusted gain 


Oo m&® DO 


bo oP) 


There may be many who question the assump- 
tion that the means on the norms furnished by 
standardized test publishers are equally appli- 
cable to all these teachers. It may be argued 
that the intercorrelations among these criteria 
based on pupil achievement are spuriously high 
in that the ratios of the means move subgroups 
of the entire group in the same direction for all 
three criteria. In consideration of this valid 
objection the writer calculated the correlations 
among the criteria scores listed in Table XXX. 
This uses the assumption that the teachers in 
every class are comparable with those of all 
the other classes. Using this procedure the 
correlations obtained are as follows: 


1. Pretest with final 0. 16 
2. Pretest with adjusted gain -. 01 
3. Final with gain 0. 69 


The only one of these that could be considered 
significantly different from zero is the correl- 
ation of final score with gain. 

As a matter of interest it was decided to 
check the consistency of the ratings by pupils 
and peers. In the data sheet the ratings by the 
pupils are numbered from one to five. These 
are numbers that were assigned to the rating 
sheets before they were sent to the pupils. The 
ratings in column Pu, represent the ratings of 
their respective teachers by the pupils who re- 
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TABLE XxXIx 


STANDARD SCORES FOR TEACHERS IN COURSES ON THE THREE CRITERIA 











Course Teacher Zp Zf Zg 
I 27 -0.59 +0. 49 +0. 69 
A +1.17 -0. 71 -0. 49 
B +1. 60 +1. 05 +0.51 
28 -0. 43 +1.28 +1. 52 
29a -0. 72 -1. 45 -1.56 
29b -1. 04 -0. 65 -0. 66 
I 20 +1. 01 -0. 68 ' 40.24 
22 -0. 98 +1.11 +0.51 
23 -1.22 +0. 31 +1.59 
24 -0. 76 -1. 65 -0.27 
25 +1.07 +1.18 -0.37 
26 +0. 88 -0. 26 -1. 72 
pant 31 +1. 69 -1.22 -0. 04 
Cc +0. 02 -0. 97 -1. 75 
32 -1. 32 +0. 23 +0. 88 
D -0.50 +0. 48 -0. 13 
F +0. 15 +1. 49 "  +1..05 
IV 18 +0. 61 -1.38 -1.49 
13 and F -1.77 -0. 94 -0. 76 
17 +0. 01 +0.50 +0. 1 
19 +1.21 +1.34 +1,.29 
16 -0. 08 +0. 46 +0. 77 
Vv 14 +0. 13 -1.58 -1. 65 
16 -0. 22 -0. 02 -0.21 
12 -0.15 +0. 76 +0. 63 
15 +1, 69 +1.30 +1. 35 
13 and G -1. 44 -0. 46 -0.11 
VI +2.17 +0. 78 +0. 89 
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TABLE Xxx 
STANDARD SCORES OF TEACHERS WITH CONSTANT FIVE ADDED 


ANDERSON 








Course Teacher Sp Se Sg 
I 27 4.41 5. 49 5. 69 
28 4.57 6. 28 6. 52 
29* 4.12 3. 95 3. 89 
0 20 6. 01 4, 32 5.24 
22 4. 02 6.11 5.51 
23 3. 78 5.31 6.59 
24 4.24 3.35 4. 73 
25 6. 07 6.18 4. 63 
26 5. 88 4.74 3.28 
pan 31 6. 69 3. 78 4. 96 
32 3. 63 5. 23 5. 88 
IV 18 5. 61 3. 62 3.51 
13** 3. 23 Ter owe 
17 5. 01 5.50 5. 15 
19 6.21 6. 34 6.29 
16 4. 92 5. 46 5. 77 
Vv 14 5.13 3. 42 3.35 
16 4.78 4.98 4.79 
12 4. 85 5. 76 5. 63 
‘ 15 6. 69 6. 30 6. 35 
13** 3. 56 eee Tr 
VI 4 7.17 5. 78 5. 89 
7 4.50 5. 67 5. 52 
3 4.54 4.07 4. 60 
1 4.47 6. 40 6. 33 
8 4.25 4. 36 4.20 
5 5. 10 3. 72 3. 46 
VII 11** 6. 07 oes cece 
1 6. 38 5.37 3.77 
21 5. 30 5. 60 6. 38 
10 4.53 5. 93 4.59 
9 3. 70 3.10 4.30 
6 4. 03 5. 00 5. 99 





* Average of scores for two sections 


**Can be evaluated over one semester only. Only pretest used. 
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TABLE XxXXI 
COMPARISON OF MEANS OF NORMING AND COURSE 
POPULATIONS 
Course Course Means Norm Means 
I 17. 34 21.9 
U 9. 23 10.0 
ttt 54.54 47. 65 
IV 21. 62 21.9 
Vv 29.17 20.2 
VI 91. 04 89.0 
vu 97. 87 104. 0 
TABLE XxXxIl 


TEACHER SCORES ON ACHIEVEMENT CRITERIA AFTER WEIGHTING 








Course Teacher Cy Cf Cg 
I 27 3. 49 4. 35 4.51 
28 3. 62 4. 97 5. 16 

29 3. 26 3.13 3. 08 

0 20 5. 55 3.99 4. 84 
22 3. 71 5. 64 5. 09 

23 3. 49 4.90 6. 08 

24 3. 91 3. 09 4.37 

25 5. 60 5. 70 4.27 

26 5. 43 4. 38 3. 03 

I 31 7. 66 4. 33 5. 68 
32 4.15 5. 99 6. 73 

IV 18 5.54 3.58 3.47 
13 3.19 anes cous 

17 4. 95 5. 43 5. 09 

19 6.13 6. 26 6.21 

16 4. 86 5. 39 5. 70 

Vv 14 7.39 4. 93 4. 83 
16 6. 89 7.17 6. 90 

12 6. 99 8. 30 8.11 

15 9. 64 9. 08 9.15 

13 5.13 anes os02 

VI 4 7. 33 5. 91 6. 02 
7 4. 60 5. 80 5. 64 

3 4. 64 4.16 4.70 

1 4.57 6.54 6. 47 

8 4.35 4. 46 4.29 

5 5.21 3. 80 3.54 

vil 11 5. 71 cece ose 
1 6. 00 5. 05 3. 55 

21 4. 99 5.27 6. 00 

10 4. 26 5.58 4. 32 

9 3. 48 2.92 4. 05 

6 3.79 4.71 5. 64 
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ceived the blanks numbered one. The other 
columns have ratings which were obtained ina 
Similar manner. The scores of the teachers on 
the five pupil ratings were correlated. This 
produced ten correlations, each of them based 
on the judgments made by 52 pupils about 26 
teachers. The seperation of the pupils into 
groups was predetermined as the blanks were 
sent out. The range of the ten correlations was 
0. 00 to 0. 74 with a median value of 0.42. Ina 
comparable manner the ratings of the teachers 
by two groups of teaching colleagues were cor- 
related (r = 0.26). These two groups were 
chosen as far as possible to represent the sub- 
ject matter area in which the teacher was work- 
ing. An attempt was made to divide the two 
groups into men and women when the two orig- 
inal forms were sent out. The consistencies in 
rating, as measured here, are quite low. 

Any evaluation of teaching effectiveness by 
pupil achievement on standardized tests is in- 
valid insofar as the tests fail to measure the ob- 
jectives of instruction. In order to check on 
this aspect of the validity of the testing program, 
the reaction of each teacher to the tests used in 
his classes were sought by means of a short 
questionnaire. (Appendix II.H) The teachers 
were asked to rate the test on a five point scale 
with 1 indicating a very satisfactory test and 
5 very unsatisfactory. The teachers were also 
invited to make any criticisms or comments they 
felt appropriate. Of the thirty-two teachers in- 
vited to rate and comment, twenty-seven re- 
sponded. Only a few did more than rate the test 
and offer some brief comment. No second at- 
tempt was made to secure the five additional rat- 
ings because the school year was almost over. 
The judgments and comments of the teachersare 
tabulated by classes in Table XXXIV. Only those 
comments which may be construed as criticisms 
are tabulated because the original questionnaire 
sought the criticisms only. In general, the crit- 
icisms are confined to two categories: (1) tests 
cover materials not taught in classes; and (2) 
important objectives are excluded. The mean 
value of the ratings of the tests (2. 4) is indica- 
tive that the tests were slightly better than ade- 
quate. 

When the intercorrelations among the three 
measures of teaching effectiveness based on pu- 
pil achievement are considered after weighting 
for differences in accomplishment in the various 
courses, there definitely appears to be a posi- 
tive correlation among them. The standard er- 
ror for samples of size thirty is 0. 18 when ran- 
domly drawn from an infinite population. All 
the intercorrelations among the pretest, final, 
and gain criteria are at least three times as 
large. It would appear unreasonable to believe 
that the true correlations are zero. As pointed 
out previously, the correlations drop so low 
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that they might be regarded as chance fluctua- 
tions from zero for all except the final with gain 
criterion (r = 0. 69) if the justification of weight- 
ing groups according to their means is denied. 

When the principal’s evaluation on the M- 
Blank was correlated with the other measures 
of teaching effectiveness, it was found to cor- 
relate 0.57 with the administrative X criterion 
and 0. 43 with the teacher’s self rating. All the 
other correlations may be readily explained as 
possible chance variations from a true zero. 
The sum of the pupil ratings was found to cor- 
relate 0.38 with the sum of the evaluations of 
the peers. No other significant correlations 
were found for this variable. Peer rating cor- 
related 0. 40 with the administrative X criter- 
ion. Self-evaluation was found to correlate 
0.33 with the administrative X criterion. The 
other relationships are too small to warrant 
mention. 

In summary, it would appear from the re- 
sults of this investigation that no appreciable 
relationships exist between rating criteria and 
pupil attainment criteria. Evidently the criter- 
ia evaluate different aspects of teaching effec- 
tiveness. There are low positive correlations 
among the constituent parts of each of these two 
groups of criteria. Whether one uses a com- 
posite criterion or several separate criteria 
will depend upon the purpose of the evaluation 
undertaken. It is not easy, even if sometimes 
desirable, to represent effectiveness in a com- 
plex activity such as teaching, by a single cri- 
terion. 


SECTION V 


SUMMARY, CONCLUSIONS. AND 
IMPLICATIONS 


Summary and Conclusions 





THE PURPOSE of this investigation 
was to determine the relationships among cer- 
tain criteria used to evaluate the teaching effec- 
tiveness of graduates of the School of Education, 
University of Wisconsin. The teaching effec- 
tiveness of a selected group was estimated by 
the use of eight different criteria. The rela- 
tionships among the specific criteria were ex- 
pressed by linear coefficients of correlation. 

The group studied in this investigation con- 
sisted of thirty teachers who received the Uni- 
versity Teachers Certificate in 1950 and are 
now employed in twenty-eight high schools in 
the State of Wisconsin. This selected group ap- 
pears to differ from the universe of graduates 
of the School of Education in that they (1) exhib- 
it a slight superiority in grade point average, 
(2) have one year of teaching experience, and 
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(3) are teaching in selected subject matter areas. 
With these exceptions, there is no reason for 


not believing the group representative of the grad- 


uates of the School of Education. 

The subjects were evaluated on separate types 
of rating scales by principals, peers, pupils, the 
teacher himself, and an outside agency whose 
identity is confidential. In addition, the subject 
matter achievements of their pupils, as meas- 
ured on certain standardized tests, were used 
to develop three other criteria of teaching effec- 
tiveness. 

Complete data, including measures of intelli- 
gence, age, and average scholastic standing, as 
well as the scores on the initial and final admin- 
istration of the achievement examinations, were 
secured for seven hundred and forty-one pupiis 
taught by these teachers. On the basis of these 
data, the teachers were evaluated as to initial 
test status, final test status, and pupil gain. 
Pretest status was justified as a measure of 
teaching effectiveness in that three months of 
the school year had elapsed prior to the testing. 

It was assumed that the achievement of the 
pupils is a function of (1) the effects of the tea- 
cher, (2) the intelligence, age, average schol- 
astic standing, and previous knowledge of the 
subject, and (3) other factors not mentioned. 
Regression equations were set up which predict- 
ed the achievement to be expected on the basis 
of the measured pupil factors. Whenthese pre- 
dicted achievements were subtracted from the 
actual pupil achievements, the residuals were 
considered to be the results of the efforts of the 
teachers plus the effects of the unmeasured fac- 
tors. It would seem reasonable to assume that 
the contributions of the unmeasured factors were 
approximately equally distributed among the 
classes taught by these teachers. Such residual 
estimates were obtained for each class using 
both pretest and final class standings. Since 
none of the measured pupil characteristics could 
be used for predicting pupil gain, the gains them- 
selves were used as measures of effectiveness. 
In some courses the gains were, however, ad- 
justed for difficulties associated with high pre- 
test scores. 

The standings of the thirty teachers on the 
eight criteria were correlated by the Pearson 
product moment method. No correlation appre- 
ciably different from zero was discovered be - 
tween the evaluations of the teachers on the dif- 
ferent rating scales and the evaluations based 
on the achievements of their pupils in the sub- 
ject matter areas. Although low, there appears 
to be a positive association among the assess- 
ments of teaching ability made by the principals, 
pupils, peers, teachers, and the administrative 
X criterion. A considerable amount of rela- 
tionship seems to exist between the evaluations 
of the teachers based on the final achievement 
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of their pupils in tests of subject matter when 
differences in age, intelligence, average schol- 
astic standing, and pretest knowledge are par- 
tialed out and the evaluations based on pupil 
gain in subject matter after the adjustment for 
differences in initial score. 


Discussion and Implications 





The results of this study imply that the eval- 
uation of teachers is to some extent influenced 
by the choice of the criterion. Evaluation must 
be made in terms of the criterion used as well 
as the rater, the situation, and other factors. 

If a thorough knowledge of the effectiveness 
of the teacher is desired, the several criteria 
must be used. Ifallofthese criteria are thought 
to have value, none correlated highly enough 
with the other to justify omission on the grounds 
that they repeat the effects of the other criteria. 
Presumably the validation of all criteria must, 
in the final analysis, rest with the philosophy 
adhered to. 

In some instances a composite evaluation 
may seem desirable. To employ simple aver- 
ages assumes that all components are equally 
weighted, an assumption that probably is very 
questionable. Accordingly, some future re- 
search must use factor analysis or other statis- 
tical technique to properly weight the compon- 
ents. 

The present study assumed that pupil gain 
in subject matter was an important and practi- 
cal approach to criteria building. If such an 
approach is used, it would seem advisable that 
future studies be designed to investigate the re- 
lationship between subject matter gainand gain 
in the more intangible aspects of the desired 
concomitant learnings. Certainly there is a 
need of developing more reliable and valid in- 
struments for measurement of the concomitant 
learnings. 

If pupil gains are to be used as a measure of 
teaching effectiveness, the tests must be given 
under such conditions that the gains are much 
larger than the standard error of thetests. This 
would require that the tests measure the objec- 
tives of instruction and that the period between 
test and retest be sufficiently long to permit 
large gains. A casual examination of the gains 
produced and the standard errors as given in 
the norms serves to advocate caution in the in- 
terpretation of gain scores as a criterion. While 
the tests employed in this study have high reli- 
ability one may question whether they covered 
the objectives of the individual teacher. 

Rating scales when used as criteria lack ob- 
jectivity. If, for example, the same rating 
scale is used by two individuals, the correla- 
tions, although positive, are usually low. This 
was amply demonstrated by the correlations of 
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the five groups of pupil ratings and the peer 
ratings as used in this investigation. If teacher 
ratings are to be used, this objectivity and the 
reliability need to be improved. 

Possibly raters tend to overrate instructors 
when using rating scales. A study of the data 
for the rating scales employed shows that the 
mean rating places Wisconsin graduates consid- 
erably above the mean for the teachers in par- 
ticipating schools. While those who train teach- 
ers at Wisconsin would prefer such an interpre- 
tation, it is more probable that it represents the 
ego of the raters and also the amount of person- 
al relationship between the rater and the ratee. 

In general, no adequate basis for validation 
of teacher evaluations exist at present. There 
is apparently no general agreement as to what 
is good teaching, and even if there were, pres- 
ent day measures lack the reliability necessary 
for valid criteria. 


BIBLIOGRAPHY 


1. Adams, W. C. T. ‘*Teacher Rating Card,’’ 
American School Board Journal, LVI 
(December 1919), p. 30. 

2. Albert, H. R. ‘‘An Analysis of Teacher 
Rating by Pupils in San Antonio, Texas, ’’ 
Educational Administration and Supervis- 
ion, XXVII (April 1941), pp. 267-74. 

3. Baier, Donald E. ‘‘Reply to Travers: A 
Critical Review of the Validity and Ration- 
ale of the Forced Choice Technique, *’ 
Psychological Bulletin, XLVIII (September 
1951), pp. 421-34. 

4. Barr, A. S. and others. ‘‘The Validity of 
Certain Instruments Employed in the Meas- 

ur urement of Teaching Ability, ’’ in The Meas- 
urement of Teaching Efficiency, Helen 
M. Walker, Editor New York: Macmillan 
Co., 1935). pp. 71-141. 
‘‘Measurement of Teaching Ability,’’ 
Review of Educational Research (June 19- 
34), pp. ~ 261-67; 329-30. 
‘(Measurement of Teaching Ability,’’ 
Review of Educational Research X (June 
1940), pp. 182-84. 
‘‘The Measurement and Prediction 
of Teaching Efficiency, ’’ Review of Educa- 
tional Research, XIII (June 1943), pp. 218- 
23. 
































‘The Measurement of Teacher Char- 
acteristics and the Prediction of Teaching 
Efficiency,’’ Review of Educational 
Research, XXII (June 1952), pp. 169-73. 

9. Beecher, Dwight E. Evaluation of Teaching 
(Syracuse, N. Y.: Syracuse University 
Press, 1949), 105 pp. 

10. Betts, Gilbert L. Reliability and Validity 
of Measures of Teaching Ability or Teach- 

















ANDERSON 


11. 


12. 


13. 


14, 


15. 


16. 


17. 


18. 


19. 


20. 


21. 


22. 


23. 


24. 


69 


Success, Special Survey Studies, Na- 
tional Survey of the Education of Teach- 
ers, Bulletin, 1933, No. 10, Vol. V, 
Part II (Washington, D. C.: Government 
Printing Office, 1935), pp. 87-153. 

Bimson, Oliver H. ‘‘Do Good Teachers 
Produce Good Results ?’’ North Central 
Association Quarterly, XII (October 1937), 
pp. 271-76. 

Boardman, Charles W. Professional Tests 
as Measures of Teaching Efficiency in 
High School, High School, Contributions to Education, 
No. 327 (New York: Teachers College, 
Columbia University, 1928), 85 pp. 

Book, M. F. ‘‘The High School Teacher 
from the Pupils’ Point of View, ’’ Peda- 
gogic Seminary, XII (September 1905), 
pp. 239-88. 

Boyce, A. C. ‘‘Methods of Measuring 
Teachers Efficiency, ’’ Part II, Fourteenth 
Yearbook of N.S.S.E. (Bloomington, II1.: 
Public School Publishing Co. , 1915). 

Boyce, Robert B., and Bryan, Roy C. ‘‘To 
What Extent Do Pupils’ Opinions of Teach- 
ers Change in Later Years?’’ Journal of 
Educational Research, XXXVII (May 
1944), pp. 698-705. 

Brandt, W. J. ‘‘Followup of Some Earlier 
Wisconsin Studies of Teaching Efficiency,’’ 
Journal of Experimental Education, XVIII 
(September 1949), pp. 1-29. 

Brooks, Samuel S. ‘‘Measuring the Effic- 
iency of Teachers by Standardized Tests,’’ 
Journal of Educational Research, IV (No- 
vember 1921), pp. 255-64. 

Bryan, Roy C. ‘‘Pupil Rating of Secondary 
School Teachers, ’’ School Review, XLVI 
(May 1938), pp. 357-67. 

Butsch, R. L. C. ‘‘Teacher Rating, ’’ Re- 
view of Educational Research, I (April 
1931), pp. 99-107. 

Campbell, R. F. ‘‘Evaluation and the Rat- 
ing of Teachers, ’’ Elementary School 
Journal, XLI (May 1941), pp. 671-76. 

Cook, Walter W., and Leeds, Carroll H. 
‘*Measuring the Teaching Personality, ’’ 
Educational and Psychological Measure- 
ment VII (May 1947), pp. 399-410. 

Cook, William H. ‘‘Uniform Standards for 
Judging Teachers in South Dakota, ’’ Edu- 
cational Administration and Supervision, 
VII (1921), pp. 1-11. 

Coy, Genevieve L. ‘‘A Study of Various 
Factors Which Influence the Use of the 
Accomplishment Quotient as a Measure 
of Teaching Efficiency, ’’ Journal of Ed- 
ucational Research, XXI (January 1930), 
pp. 29-42. 

Crabs, Lelah M. Measuring Efficiency in 


Supervision and Teaching, Contributions 
to Education, No. 175 (New York: Teach- 















































70 


25. 


26. 


27. 


28. 


29. 


30. 


31. 


32. 


33. 


34. 


35. 


36. 


37. 


38. 


39. 


40. 


JOURNAL OF EXPERIMENTAL EDUCATION 


ers College, Columbia University, 1925), 
98 pp. 

Cronbach, Lee J. ‘‘Test Reliability: Its 
Meaning and Determination, ’’ Psychomet- 
rika, XII (March 1947), pp. 1-17. 

Domas, S. J., and Tiedeman, D. V. ‘‘Tea- 
cher Competence: An Annotated Bibliog- 
raphy, ’’ Journal of Experimental Educa- 
tion, XIX (December 1950), pp. 101-218. 

Evans, Kathleen M. ‘‘A Critical Survey of 
Methods of Assessing Teaching Ability, ”’ 
British Journal of Educational Psychology, 
XXI, Part I, (June 1951), pp. 89-95. 

Goetsch, E. W. ‘‘In Service Checkup of 
Iowa State Teachers College Graduates, ”’ 
School and Society, LXIX (June 1949), pp. 
419-23. 

Good, Carter V.,and others. The Method- 
ology of Educational Research (New York: 
D. Appleton-Century Co. , 1941), 863 pp. 

Gotham, R. E. ‘‘Personality and Teaching 
Efficiency, ’’ Journal of Experimental Ed- 
ucation, XIV (December 1945), pp. 157- 
65. 

Guthrie, E. R. ‘‘Evaluation of Faculty Ser- 
vice, ’’ American Association of Univer- 
sity Professors Bulletin, XXXI (June 
1945), pp. 255-62. 

Hampton, N. D. ‘‘An Analysis of Supervis- 
ory Ratings of Elementary Teachers Grad- 
uated from Iowa State Teachers College, ”’ 
Journal of Experimental Education, XX 
(December 1951), pp. 179-215. 

Hansen, BasilC. ‘Open the Curtain on 
this Farce Called Teacher Evaluation, ”’ 
Educational Administration and Supervis- 
ion, XXXII (October 1946), pp. 412-18. 

Hielman, J. D., and Arman Trout, W. D. 
‘The Rating of College Teachers on Ten 
Traits by Their Students, ’’ Journal of Ed- 
ucational Psychology, XXVII (March 1936), 
pp. 197-216. 

Hildreth, Gertrude. ‘‘Results of Repeated 
Measurement of Pupil Achievement, ’’ 
Journal of Educational Psychology, XXI 
(April 1930), pp. 286-96. 

Hill, C. W. ‘‘The Efficiency Ratings of 
Teachers, ’’ Elementary School Journal, 
XXI (February 1921), pp. 438-43. 

Hoppoch, Robert. ‘‘New York University 
Students Grade Their Professors, ’’ School 
and Society, LXVI (July 1947), pp. 70-72. 

Johnson, Palmer O. Statistical Methods in 
Research (New York: Prentice Hall, 1949), 
357 pp. 

Jones, Ronald D. ‘‘The Prediction of Teach- 
ing Efficiency from Objective Measures, ”’ 
Journal of Experimental Education, XV 
(September 1946), pp. 85-99. 

Knight, F. G. Qualities Related to Success 
in Teaching (New York: Teachers College, 


















































41. 


42. 


43. 


44. 


45. 


46. 


47. 


48. 


49. 


50. 


51. 


52. 


53. 


54. 


55. 


56. 


57. 





(Vol. XXIII 


Columbia University, 1922), 67 pp. 

Kratz, Henry Elton. Studies and Observa- 
tions in the Schoolroom (New York: Edu- 
cational Publishing Co. , 1907), 220 pp. 

LaDuke, C. V. ‘‘The Measurement of 
Teaching Ability, Study No. 3,’’ Journal 
of Experimental Education, XIV (Septem- 
ber 1945), pp. 75-100. 

Lindquist, E. F., and others. Educational 
Measurement (Washington, D. C.: Amer- 
ican Council on Education, 1951), 808 pp. 

Lins, Leo J. ‘‘The Prediction of Teaching 
Efficiency, ’’ Journal of Experimental Ed- 
ucation, XV (September 1946), pp. 2-60. 

McCartha, CarlW. ‘‘The Practice of Tea- 
cher Evaluation in the Southwest in 1948,’’ 
Journal of Educational Research, XLIV 
(October 1950), pp. 122-28. 

McNemar, Quinn. Psychological Statistics 
(New York: John Wiley and Sons, Inc. , 
1949), 344 pp. 

Merriam, J. L. Normal School Education 
and Efficiency in Teaching (New York: 
Teachers College, Columbia University, 
1906). 

Myers, G. C. ‘‘Teachers’ Ratings, ’’ School 
and Society, V (March 17, 1917), pp. 322- 
23 























Peters, C. C., and VanVoorhis, W. R. 
Statistical Procedures and Their Mathe- 
matical Bases (New York: McGraw Hill 
Book Co. , 1940), 478 pp. 

Pittenger, B. F. ‘‘Problems in Teacher 
Measurement, ’’ Journal of Educational 
Psychology, VII (1917), pp. 103-10. 

Remmers, H. H., and otheis. ‘‘Interrela- 
tionship of Various Teaching Criteria,.’’ 
American Psychologist (abstract), IV-7 
(July 1949), p. 288. 

Rider, Paul R. An Introduction to Modern 
Statistical Methods (New York: John Wiley 
and Sons, Inc. , 1939), 192 pp. 

Rolfe, J. G. ‘‘The Measurment of Teach- 
ing Ability, Study No. 2,’’ Journal of Ex- 
perimental Education, XIV (September 
1945), pp. 52-74. 

Ross, C. C. Measurement in Today’s 
Schools (New York: Prentice-Hall, 1947), 
532 pp. 

Rostker, Leon E, ‘‘The Measurement of 
Teaching Ability, Study No. 1,’’ Journal 
of Experimental Education, XIV (Septem- 
ber 1945), pp. 6-51. 

Ruediger, W. C., and Strauer, G. D. ‘‘The 
Qualities of Merit in Teachers, ’’ Journal 
of Educational Psychology, I (1910), pp. 
272-78. 

Rugg, Harold. ‘‘Is the Rating of Human 
Character Possible?’’ Journal of Educa- 
tional Psychology, XII (November, De- 
cember 1921), pp. 425-38; 485-501; XIII 


















































September, 1954) ANDERSON 


(January, October 1922), pp. 30-42; 81- 
93 


58. Ryans, DavidG. ‘‘The Criteria of Teach- 
ing Effectiveness, ’’ Journal of Educational 
Research, XLII (May 1949), pp. 690-99. 

59. Schultz, Frank G. ‘‘Appraising and Reward- 
ing Teaching Effectiveness, ’’ Charting the 
Course for American Higher Education in 
a Period of Partial Mobilization (Wash - 
ington, D.C.: Department of Higher Edu- 
cation, National Education Association, 
1951). 

60. Symonds, Percival M. ‘‘Reflections on Ob- 
servations of Teachers, ’’ Journal of Edu- 
cational Research, XLII (May 1950), pp. 
688-96. 

61. Taylor, H. ‘‘The Influence of the Teacher 
on Relative Class Standing, ’’ 27th Year- 
book of the N.S.S.E., Part II (Blooming- 
ton, Ill.; Public School Publishing Co. , 























71 


1928), pp. 97-100. 


62. Thomson, Godfrey H. The Factorial Anal- 


ysis of Human Ability (New York: Hough- 
ton Mifflin Co., 1939), pp. 89-95. 





63. Walker, H. M. Elementary Statistical 





Methods (New York: Henry Holt and Co. , 
1943), 298 pp. 


64. Witty, Paul H. ‘‘An Evaluation of Investi- 


65 


66 


gations of the Effective Teacher, ’’ Amer- 
ican Psychologist, III (July 1948), pp. 264- 
65. 

. Yankey, J. V., and Anderson, P. L. ‘‘Re- 
view of Literature on Factors Condition- 
ing Teacher Success, ’’ Educational Admin- 
istration and Supervision, XIX (October 
1933), pp. 511-520. 

. Buros, Oscar Krisen. The Third Mental 
Measurements Yearbook (New Bruns- 
wick: Rutgers University Press, 1949), 
1246 pp. 


























TEMPERAMENT AND TEACHING SUCCESS 


HAROLD WESLEY MONTROSS* 
University of Wisconsin Extension 
Rhinelander, Wisconsin 


SECTION I 
INTRODUCTION 


A PERSISTENT problem in the field of 
teacher education is the determination of the per- 
sonal qualities necessary for success in teach- 
ing. 

Research in this area has been greatly ex- 
tended in recent years. An annotated bibliog- 
raphy of teacher competences lists 1006 studies 
published since 1890 (15). Barr (4) has compiled 
a summary of 150 of these studies dealing spec- 
ifically with the measurement and prediction of 
teaching efficiency. 

A survey of the researchin the field would seem 
to show that the teacher’s personality is frequently 
thought important in relation to success inteach- 
ing as early as 1910. Ruediger and Strayer (31) 
found that personality was among the most sig- 
nificant qualities of the successful teachers. 
Witty (43) analyzed some twelve thousand letters 
submitted by pupils from all over the nation. 
The desirable teacher traits mentioned most 
frequently were: (1) cooperative democratic at- 
titude; (2) kindliness and consideration for the 
individual; (3) patience; (4) wide variety of inter- 
ests; (5) general appearance and pleasing man- 
ner. 

Although research in the field of teacher ed- 
ucation would seem to indicate that personality 
is an important variable in teaching success, the 
identification and definition of this variable has 
not been made. 

This study is concerned with those aspects 
of personality sometimes called temperament. 
Allport (1) has characterized temperament as a 
certain class of raw material from which per- 
sonality is fashioned. Cattell (10) describes it 
as the ‘‘constitutional factors’’ in pers onality, 
describing ‘‘constitutional’’ as referring to that 
part of the personality which, at any age, is least 
subject to change. Harriman (20) in The New 
Dictionary of Psychology, defines temperament 
as: 

The more or less stable effective pat- 

tern characteristic of an individual. It 

is revealed by one’s susceptibility to 

emotional stimulation, by reactiontime, 

by strength of responses, by qualityand 

intensity of moods and by all that is sub- 

sumed under ‘emotional nature’. 








Temperament is used to designate the more 
or less stable dispositions that are relatively 
permanent and evidenced at an early age. The 
earliest adjustments which can be distinguished 
in infants and which show differences are in the 
areas of spontaneous activity and emotional ex- 
pression, —both temperamental traits. Re- 
search indicates that these characteristics of 
early childhood tend to persist (1). 

Although temperament might be said to des- 
ignate a kind of raw material from which per- 
sonality is fashioned, it probably cannot accur- 
ately be said that temperament exists apart 
from personality, nor is there any personality 
which is devoid of temperament. 

The area of temperament would seem to be 
one of the logical starting points for a study of 
the personal prerequisites to teaching success. 
This would appear so inasmuchas: (1) traits 
are evidenced at an early age; (2) traits are rel- 
atively persistent and stable; (3) traits are said 
to be the raw material from which personality 
is fashioned. This investigation is concerned 
with certain aspects of temperament and their 
relationship to teaching success. 


SECTION II 
DESIGN OF THE STUDY 


Description of the Subjects 


TWO HUNDRED seventy-nine students 
in the School of Education at the University of 
Wisconsin secured teaching positions in second- 
ary schools of Wisconsin in the fall of 1950. One 
hundred one of this group were visited under the 
beginning teachers visitation program of the Un- 
iversity. From this group thirty-five were will- 
ing and available to participate in further inten- 
sive testing which constituted a part of this in- 
vestigation. 

The thirty-five subjects of this investigation 
consisted of sixteen male and nineteen female 
teachers majoring in the areas of English, math- 
ematics, physical education, speech, science, 
art, music, social science, and home econom- 
ics. 

Each individual of this group had completed 
a four-year course at the University of Wiscon- 
sin and had taught at least one year in the sec- 
ondary schools of Wisconsin. 








#The author wishes to express his appreciation to his major professor, Dr. A. S. Barr, for his guid- 
ance, encouragement, and inspiration throughout the study. 
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Table I presents a comparison between the 
subjects of this investigation and the subjects in 
a study by Lins (28). Comparisons are made in 
the following areas: the mean grade point aver- 
age in Education 73, Education 74, Education 
75 practice teaching (required courses in Edu- 
cation at the University of Wisconsin), the total 
four year grade point average, and the mean 
percentile rank on the American Council on Ed- 
ucation Psychological Examination. 

The subjects of this investigation seem not 
to be greatly different from other groups stud- 
ied at the University of Wisconsin. 


Description of Data-Gathering Devices 





A major purpose of the investigation was to 
study the relationship between certain aspects 
of temperament and teaching success. Temper- 
ament was measured by the Thurstone Temper- 
ament Schedule, Cattell’s 16 Personality Fac- 
tor Test, and a series of objective type tests 
of tempo, fluency, speed, suggestibility, dispo- 
sition rigidity, and dexterity—coordination. 

I. The Thurstone Temperament Schedule 
(Appendix B):* This is a self-rating test de- 
signed to assess those traits which are relative- 
ly permanent for each individual. Thetest was 
constructed through a factorial study with three 
hundred and forty questions about personality 
which seemed effective in differentiating between 
persons who would be considered well adjusted 
and which would describe behavior resulting from 
a relatively stable trait. Thurstone found that 
seven factors accounted for the variance. These 
he has tentatively named and described as: (37) 


a) Active (A). A person scoring high inthis 
area usually works and moves rapidly. He 
is restless whenever he has to be quiet. He 
likes to be ‘‘on the go’’ and tends to hur ry. 
He usually speaks, walks, writes, drives 
and works rapidly, even when these activities 
do not demand speed. 


b) Vigorous (V). A person with a high score in 
this area participates in physical sports, work 
requiring the use of his hands and the use of 
tools, and outdoor occupations. This group 
emphasizes physical activity using large 
muscle groups and great expenditure of en- 


ergy. 


c) Impulsive (I). High scores in this category in- 
dicate a happy~go-lucky, daredevil, carefree, 
acting-on-the-spur-of-the-moment disposi- 
tion. The person makes decisions quickly, 








(Vol. XXII 


enjoys competition, and changes easily from 
one task to another. The decision to act or 
change is quick regardless of whether the 
person moves slowly or rapidly (Active), or 
enjoys or dislikes strenuous projects (Vigor- 
ous). 


Dominant (D). People scoring high on this 
factor think cf themselves as leaders, cap- 
able of taking initiative and responsibility. 
They are not domineering, even though they 
have leadership ability. They enjoy public 
speaking, or organizing social activities, 
promoting new projects and persuading others. 


d 


~— 


Stable (E for emotionally stable). Persons 
who have high Stable scores usually are 
cheerful and have an even disposition. They 
can relax ina noisy room, and they remain 
calm ina crisis. They claim that they can 
disregard distractions while studying. They 
are not irritated if interrupted when concen- 
trating, and they do not fret about daily 
chores. 


e 


~~ 


f) Sociable (S). Persons with high scores in 
this area enjoy the company of others, make 
friends easily, and are sympathetic, cooper- 
ative, and agreeable in their relations with 
people. 


Reflective (R). High scores in this area in- 
dicate that a person likes meditative and re- 
flective thinking and enjoys dealing withtheo- 
retical rather than practical problems. Self- 
examination is characteristic of reflective 
persons. These people are usually quiet, 
work alone, and enjoy work that requires ac- 
curacy and fine detail. 


~— 


g 


The Schedule was designed to provide a quick 
survey of the seven areas of temperament. The 
reliabilities for the seven areas in the Schedule 
were computed by the split-half method and by 
the test-retest method. These reliabilitiesare 
summarized in Tables I and I. 

The tests were administered during the sum- 
mer and fall of 1951 at the convenience of the 
subjects. Standard directions were given and 
the subjects were allowed as much time as nec- 
essary to complete the test. 

I. The Cattell 16 Factor Personality Test 
(Appendix C): On the basis of factorial analysis 
Caitell has designed a questionnaire type test 
which sets out to measure 16 personality source 
traits which he feels provide the fundamental 
structure of personality. The 16 PF. Test 





*All references to Appemiices may be found in original mamscript on file in Library, University of 


Wisconsin. 
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TABLE I 


COMPARISON OF THE MEAN GRADE POINT AVERAGE IN EDUCATION 
73, 74, 75 PRACTICE TEACHING, TOTAL FOUR YEAR GRADE POINT 
AVERAGE, AND MEAN PERCENTILE RANK ON THE AMERICAN COUN- 
CIL ON EDUCATION PSYCHOLOGICAL EXAMINATION WITH THE SUB- 
JECTS IN A STUDY BY LINS 








Mean 
Measure : 2? 
Education 73 2.31 2.34 
Education 74 2.13 2. 02 
Education 75, Practice Teaching 2.37 2.17 
Total Four Year Grade Point Average 1. 93 1. 88 
American Council on Education Psycho- 
logical Examination 65. 63 54. 453 





1Subjects of this investigation 
2Subjects in a study by Lins 
3 Adjusted mean for men and women 


TABLE I 


SPLIT-HALF RELIABILITY COEFFICIENTS FOR THE 
SEVEN AREAS OF TEMPERAMENT 








Correlation 
Area Men Women 
Active . 48 . 46 
Vigorous . 61 . 63 
Impulsive . 65 . 65 
Dominant Ry ei 
Stable . 63 . 64 
Sociable . 68 . 73 
Reflective . 73 . 62 





N 200 157 





75 
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TABLE Ill 


TEST-RETEST RELIABILITY COEFFICIENTS FOR 
THE SEVEN AREAS OF TEMPERAMENT 

















Area Correlation 
Active . 78 
Vigorous . 78 
Impulsive . 79 
Dominant . 82 
Stable .61 
Sociable ‘ 
Reflective . 15 

N 81 
TABLE IV 


SPLIT-HALF RELIABILITY COEFFICIENTS FOR 
THE 16 P.F. TEST 





Trait Correlation 
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was used as an assessment of the total personal- 
ity and as an indicator of temperamental factors. 
Cattell says of the test: 


The questionnaire thus aims to leave out 
no important aspect of the total person- 
ality, for the above factors are based on 
even sampling from the personality sphere 
and include abilities (intelligence), tem- 
peramental factors, and dynamic (charac- 
ter integration) source traits. (7) 


The questionnaire has equivaient A and B 
forms, equal in length and design. Both forms 
comprising three hundred seventy-four items 
were used. Each item is followed by the pos- 
sible response of ‘‘Yes’’, ‘‘?’’, and ‘“‘No’’. 
Cattell has suggested the ‘‘forced choice’’ ans- 
wer with the elimination of the ‘‘?’’ response 
under certain conditions. In this investigation 
the ‘‘forced choice’’ response was used. The 
tests were administered to the subjects at their 
convenience in the summer and fall of 1951. 
Each subject was allowed as much time as nec- 
essary to complete both forms of the test. 

The split-half reliability of the test ona 
sample of the general population, corrected to 
the full number of items in A and B forms is 
summarized in Table IV. 

Ill. The Objective Type Measures of Tem- 
perament (Appendix D): It was felt that tests 
which demanded no subjective self-evaluation 
might be most appropriate in this field. The 
background for the use of such tests has been 
most fully developed by Cattell (9). He has pre- 
sented tests which he terms ‘‘objective meas- 
ures’’ that are thought to indicate or assess cer- 
tain temperamental factors. Out of this, and 
other works, tests were devised and adopted. 
The areas of tempo, fluency, speed, suggesti- 
bility, disposition rigidity, and dexterity-coord- 
ination were investigated. At least two meas- 
ures were obtained from each area except for 
suggestibility where only one measure was used. 

The battery of objective type measures re- 
quired approximately one hour for administration. 
The tests were administered individually during 
the summer and fall of 1951 at the convenience 
of the subjects. 


Description of Objective Type Instruments 


Test 1: Tempo 


Tempo, in this investigation, refers to the 
natural or preferred rate of speed in any activ- 
ity. 

A number of writers, among them Frischeisen, 
Kohler, Guttman, Meumann, and Wu have con- 
ducted research in the area of tempo and have 
developed theories of temperament on the basis 
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of the existence of general personal tempo. 
The most extensive work in the study of tempo 
itself has been done by Allport and Vernon (2) 
and by Harrison (21). Allport and Vernon stud- 
ied fourteen varied natural tempo rates. They 
were unable to show the existence of a general 
preferred rate or speed, finding an average in- 
tercorrelation between the fourteen rates stud- 
ied of .17. Harrison also was unable to find a 
single general factor of tempo. Cattell (7) has 
postulated that tempo can be clustered into the 
following areas: (a) ideomotor tempo, evidenced 
,through such activities as the normal rate of 
‘reading, counting, handwriting, tapping, and 
blackboard writing; (b) rhythm tempo evidenced 
through such activities as the normal rate of 
finger and hand tapping, leg tapping, patting, 
counting and stylus compression; (c) drawing 
tempo, evidenced through such activities as the 
normal rate of drawing figures on paper, foot 
drawing and blackboard drawing; (d) general 
tempo, evidenced through such activities as the 
normal rate of cranking, tapping, body bending, 
patting, speed decision, and card sorting; (e) 
body tempo, evidenced through such activities 
as the normal rate of head turning, body bend- 
ing and arm raising; and (f) limb tempo, evi- 
denced through such activities as the normal 
rate of arm raising and walking. 

The existence and the possible temperament- 
al origin of personal tempos seems to be fairly 
well established in the literature. 

The six measures of tempo, used in this in- 
vestigation, were designed to cover the areas 
of ideomotor and general tempo. The tests 
themselves are adaptations of measures gener- 
ally used in this area. 


Subtest la: Subject read a mimeo- 
graphed paragraph of American his- 
tory. Subject was told in all tempo 
tests that time was not a factor in the 
test, but time was unobtrusively taken 
by tester. 


Score: Time of reading in seconds 
Subtest 1b: Subject wrote two lines of 
the paragraph used in subtest la. Time 
recorded as in subtest la. 

Score: Time of writing in seconds 
Subtest lc: Subject counted orally to 
the number forty. Time recorded as 
in subtest la. 


Score: Time in seconds 


Subtest 1d: Subject sorted cards used 
in subtest 3b. Time recorded as in 
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subtest la. 
Score: Time in seconds 


Subtest le: Subject tapped on tapping in- 
strument used in subtest 3a. Time re- 
corded as in subtest la. 


Score: Number of taps in thirty seconds 


Subtest lf: Subject was asked to select 
from twelve cards face downward ona 
table, the three cards which he felt were 
the highest of the twelve. On second 
trial subject was asked to select the 
three cards which he felt were the low- 
est from another set of twelve cards 
placed in the same position on the table. 
Time recorded as in subtest la. 


Score: Average time, in seconds, of 
two trials 


Test 2: Fluency 


Fluency of association, referring to the spon- 
taneous calling to mind of ideas, has beenshown 
to exist as a distinct factor. Carroll (5) and 
Johnson (23), in a factoring of verbal abilities, 
both found a factor characterized by the ready, 
spontaneous flow of ideas, words or images. 
Cattell (7) suggests the presence of at least four 
distinct kinds of fluency: (a) fluency of associ- 
ation under restriction evidenced through such 
activities as giving as many words as possible, 
under time limits, beginning with a certain let- 
ter; (b) fluency of association with minimum di- 
rection evidenced through such activities as 
number of words per minute expanding ona theme 
or topic or number of relevant words per minute 
describing a picture; (c) fluency and facility in 
oral speech evidenced through such activities as 
number of relevant words in describing a pic- 
ture; and (d) verbal speed evidenced through 
such activities as speed of handwriting and max- 
imum speed of reading. 

It may be noted that Cattell makes a sharp 
distinction between fluency as a temperamental 
trait and its association with an ability factor. 
He says: 


Fluency is one of those temperamental 
tendencies which also outcrop in abil- 
ities and are mistaken for abilities. 
Thus this same fluency shows itself, 
apparently, as Thurstone’s W, or verb- 
al fluency factor, sharply distinguished 
from V, or verbal ability, which is it- 
self essentially a knowledge of vocabu- 
lary and correct grammatical usage. (10) 
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Three measures of fluency were usedinthis 
investigation. Two of them were developed to 
assess fluency of association under restrictions. 
The remaining measure was felt to assess flu- 
ency of association under minimum direction. 


Subtest 2a: Subject was asked to record 
on a numbered sheet of paper as many 

adjectives that he could think of which 
could be used to describe a house. Time 
limit four minutes. 


Score: Number of relevant words re- 
corded 


Subtest 2b: Subject was asked to record 
on a numbered sheet of paper as many 
things that he could think of which are 
‘round’ or that are called round. Time 
limit four minutes. 


Score: Number of relevant words re- 
corded 


Subtest 2c: Subject was asked to record 
on a numbered sheet of paper as many 
words that he could think of which begin 
with the letter B. Time limit four min- 
utes. 


Score: Number of relevant words re- 
corded 


Test 3: Speed 


Speed in this investigation refers to the speed 
of repetitive or selective performance where all 
of the content is perceptually given. Much of 
the concern in the literature centers about speed 
as a general or specific factor in abilities. Har- 
rison’s work (21) seems to indicate that there 
is no single general factor of maximum speed. 
He studied a number of performances such as, 
card sorting, tapping, head turning, counting, 
reading, making decisions and found the mean 
intercorrelations to be about .15. The number 
of different types of speed performances has not 
been determined with respect to present research. 
Cattell (9) feels, though, that from the present 
research we can assume the existence of six 
speed factors: (a) speed of judgment and ideo- 
motor performances; (b) speed of reactiontime; 
(c) speed of bodily movement; (d) maximal rhy- 
thm; (e) fluency or speed of mental output and 
association; (f) various factors of naturaltempo; 
and (g) perceptual speed. The relationship of 
speed and temperament has not been determin- 
ed although Cattell suggests a general positive 
correlation. It has been further suggested that 
the significance for temperament lies in the in- 
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ter-relationships between speed, tempoand flu- 
ency. The research of Studman (36), for ex- 
ample, suggests that the difference between 
speed and fluency seems to be of special signif- 
icance. Cattell points out their significance for 
this investigation as follows: 


The great interest of these speed, tempo, 
and fluency factors for personality, which 
justifies the somewhat detailed study of 
their ability roots here, is that they are 
among the most direct indicators of temp- 
erament (9). 


The tests in this area were selected to se- 
cure measures from the first two of Cattell’s 
speed factors, those of ideomotor and reaction 
time speed. These factors represent speed 
under willed action which is distinguished from 
tempo and fluency by the definitions used in this 
investigation. 


Subtest 3a: Subject tapped on tapping in- 
strument. The apparatus consisted of a 
metal plate wired to an electrical count- 
er. A metal tapping pencil was held in 
the subject’s dominant hand. With each 
tap of the pencil, contact with the metal 
plate activated the electrical counter. 
Subject was asked to tap as fast as pos- 
sible for thirty seconds. Four test read- 
ings were taken. There was a thirty 
second rest period between each test. 


Score: Average of four readings 


Subtest 3b: Subject was told to sort a 
set of twenty-four cards into six label- 
edareas. The set of cards were num- 
bered from two to seven and sorting 
was by number only. ‘Four test read- 
ings were taken. The labeledareas 
were changed on each test in order to 
minimize the factor of learning. Sub- 
ject was asked to go as rapidly as 
possible. 


Score: Average time in seconds of four 


readings 


Subtest 3c: Subject was checked onforty 
reactions in a visual series with an ir- 
regularly spaced interval of warning. The 
signal used was ‘ready’, given by the test- 
er before flashing an amber light spaced 
midway between two keys, one of which 
had to be depressed to stop the timer. 
Subject used dominant hand and most con- 
venient key during the test. Twenty re- 
actions were taken; then the Body Sway 
Suggestibility Test was interjected be- 
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fore checking the final twenty reactions. 
Timing was in hundredths of a second. 


Score: Median reading 


Subtest 3d: This test is taken from the 
Wechsler Test of Mental Ability (41). 
Subject was asked to write beneath a 
series of numbers an arbitrarily as- 
signed symbol of each number. Time 
one minute. Subject was asked to go 
as rapidly as possible. 


Score: Number of digits with correct 
symbol 


Test 4: Suggestibility 


Many writers have investigated suggestibil- 
ity. Among them McDougall, Morgan, Allport, 
Hull and Eysenck. Eysenck (16) has per haps 
presented the most complete investigation of 
the trait. Out of his factorization of sixteen al- 
together different tests of suggestibility he post- 
ulates the existence of two and possibly three 
distinct kinds of suggestibility. These he terms 
‘primary suggestibility,’ ‘secondary suggesti- 
bility,’ and ‘prestige suggestibility. ’ 

Eysenck further investigates the personality 
correlates of primary suggestibility in analyz- 
ing the scores of one thousand four hundred 
fifty subjects. He shows significant relation- 
ships between his primary suggestibility and 
traits considered in this investigation to be of 
temperamental origin. 

A well-known measure of suggestibility, and 
one used by Eysenck, is the Body Sway Test. 
It was used in this investigation. 


Subtest 4a: Subject was blindfolded. To 
his collar was clipped a string whichran 
through a pulley taped to the walland was 
fastened to a pointer which could move 
up and down along an upright yardstick, 
also fastened to the wall. Subject was 
then asked to stand and relax for one 
minute and then listen to two amplified 
recordings. He was told not to try to in- 
hibit or accelerate any feeling or move- 
ment, buttotrytorelaxandlisten, The 
forward and backward sway of the sub- 
ject was then measured ona 1/4 inch 
sway of the individual. 


One minute—the natural sway of the in- 
dividual. 


One-half minute—the sway while the sub- 
ject listened to the voice recording of 
‘*You are slowly falling forward, slowly 
falling forward; you are slowly falling 
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forward, falling forward, falling forward, 
falling, falling.’’ Repeat. 


One-half minute—the sway while the subject 
listened to the voice recording of ‘‘You are 
slowly falling backward, slowly falling back- 
ward; you are slowly falling backward, fall- 
ing backward, falling backward, falling, fall- 


ing.’’ Repeat. 


Score: Forward and backward sway of the 
subject was recorded in inches to the near- 
est 1/4 inch, positive when it agreed with 
the direction suggested by the recording, 
negative when opposed action was shown. 


Test 5: Disposition Rigidity 


In normal individual behavior it often happens 


that an old reaction will persist when a new one 


is 


intended. In extreme cases, as in the schiz- 


ophrenic, the individual may sit for long periods 
of time repeating the same movements or words. 
This behavior has been termed perseveration 

by most of the writers in the literature. Cat- 
tell (9) in a recent extensive review of the data 
has postulated that the only assured manifesta- 
tion of perseveration is as disposition rigidity. 
In agreement with Cattell this investigation con- 
siders disposition rigidity to refer to the resis- 
tance of willed change of old established habits. 
It is best measured by tests in which the individ- 
ual tries to perform an old task in a new way. 
For example, it could be measured by the num- 
ber of times the subject could write his name in 
one minute divided by the number of times he 
could write it backward in oné minute. 


Traditionally, the tendency for behavior to 


repeat itself when once started, has been asso- 
ciated with temperament. Eysenck (16), though, 
was unable to find tests of this factor which 
would distinguish between the groups he studied. 
Cattell, on the other hand, feels that disposition 
rigidity is one of the important indices of temp- 
erament. 


The measures of disposition rigidity in this 


investigation are taken from a battery of tests 
devised by Cattell (11) for the assessment of 
this rigidity trait. 


Subtest 5a: Subject was asked, under time 
limits, to write the digits 2, 3, 7, (X). 
Time one minute. Subject was asked to 
write the digits 2, 3, 7, (Y) backwards 
both in direction and position. Time one 
minute. 


Score: Number of digits written in X, over 
number of digits written in Y. 


Subtest 5b: Subject was asked, under time 





limits, to write the symbol L (X). Time 
thirty seconds. Subject was asked, un- 
der time limits, to write the symbol J 
(Y). Time thirty seconds. Subject was 
asked, under time limits, to write the 
two symbols (Z) alternately. Time was 
one minute. 


Score: Total number of symbols in X and 
Y, over total number of symbols in Z. 


Subtest 5c: Subject was asked to trace 
through a maze (X) laid out in blocks 
comparable to a city street system. 
Letters in the maze paths indicated the 
direction subject was to go. D indicated 
down, L indicated left, R indicated right 
and U indicated up. Time one minute. 
Subject was asked to trace through a sim- 
ilar second maze (Y). In maze Y subject 
was asked to do the direct opposite of 
what the letters in the paths suggested. 
Time was two minutes. 


Score: Total number of correct decisions 
made in X over total number of correct 
decisions made in Y. 


Test 6: Dexterity—Coordination 





The significance or temperamental implica- 
tions of dexterity—coordination is not as clear 
as in the other objective test measures. Ey- 
senck (16) finds finger dexterity impaired in 
individuals of certain temperamental type. 
Cattell (10) has found what he terms some sort 
of psychomotor efficiency, related to tempera- 
ment. Schwartz (35), although not concerned 
with temperament directly, found a significant 
correlation (-. 60) between a two hand coordin- 
ation test and field teaching. 

On the strength of these suggested relation- 
ships two tests were used in this investigation. 
Both are adaptations by the author which were 
felt could give the desired assessment for this 
area. 


Subtest 6a: Instrument in Subtest 3a was 
used. Rather than tap on metal plate sub- 
ject was asked to put tapping pencil alter- 
nately into three holes placed in a triang- 
ular arrangement on the metal plate. 
When tapping pencil touched the bottom 

of the hole the electrical counter was ac- 
tivated. Time thirty seconds. Subject 
was asked to go as rapidly as possible. 
Four trials were given, two with the right 
or dominant hand and two with the left or 
non-dominant hand. 


Score: Difference between the average 
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number of taps of right and left hand. 


Subtest 6b: A sheet of paper was blocked off 
into sixty equal squares. Witha pencil 

in each hand subject was asked to mark 
simultaneously a straight line and across 
within each square successively for thirty 
seconds. 


Score: Number of squares filled with both 
symbols. 


IV. The Criteria of Teaching Success: Tea- 
ching success, in this investigation, refers to 
(a) the acceptability of the teacher by the prin- 
cipal or superintendent; and (b) it refers to 
teaching effectiveness as that effectiveness is 
defined on the Wisconsin Adaptation of the M- 
Blank. (See Appendix A. ) 


Description of Ratings 


Rating 1: Acceptability 


A rating, termed ‘‘acceptability’’ by Lamke 
(26) in a previous investigation, was obtained. 
To obtain the rating an interview was conducted 
with the teacher’s principal or superintendent, 
in which, during the course of the interview, the 
following questions were asked: 


1. How do you like this teacher? 

2. Does he get along well with students? 

3. Do the parents feel the same way as the stud- 
ents do about the teacher? 

4. Is the teacher happy here? Does he have 
friends on the faculty, in the community, and 
the like? 

5. Does the teacher have any particular weak- 

“= nesses or strengths ? 

6. In terms of beginning teachers with whom you 
have worked in the past, and in the light of 
our discussion as a whole, would you consid- 
er this teacher as among the best, above av- 
erage, average, below average, or failingas 
a teacher? 


Out of his consideration of the questions ask- 
ed, the principal gave an overall rating of the 
teacher. 


Rating 2: Principal or Superintendent’s Effic- 
iency Rating 


The rating was given by the teacher’s princi- 
pal or superintendent using the Wisconsin adap- 
tation of the M-Blank as a guide for the assess- 
ment of the teacher’s effectiveness. 


Rating 3: Experienced Rater I 
Each subject was visited in his classroom by 
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two experienced raters. A group of six grad- 
uate students, trained for the visitation pro- 
gram, formed what was termed the core group. 
The remaining raters were staff members of 
the University of Wisconsin. On each visita- 
tion at least one of the core group was included. 
This core group was labeled experienced rater 
I. 

Independent ratings were made of the teach- 
er’s effectiveness using the Wisconsin adapta- 
tion of the M-Blank as a guide. 


Rating 4: Experienced Rater II 


This rating was obtained under the same 
conditions as rating number 3 and was given by 
the staff members of the University of Wiscon- 
Sin. 

These four ratings were obtained in the tea- 
cher’s first year of teaching and after he had 
been teaching for at least three months. 


Rating 5: Principal or Superintendent’s Effic- 
iency Ra in Second Year of Teach- 


ing 


An abbreviated form (Appendix A) of the 
Wisconsin adaptation of the M-Blank was used 
for this rating. It was obtained in the subjects’ 
second year of teaching, at least three months 
after the beginning of the school year. 





Reliability and Validity of Objective Measures 
of Temperament 


Cronbach (14) suggests the consideration of 
three types of reliability coefficients, that of 
coefficients of equivalence, stability, and sta- 
bility and equivalence. The equivalence aspects 
of reliability estimates how precisely the test 
measures an individual sample of behavior at 
a particular time. It can be determined by the 
split-half method, by the Kuder-Richardson 
formula, or by immediate parallel testing. The 
stability aspect estimates whether or not a be- 
havior at the time of testing is typical behavior 
at any other time. The stability of a behavior 
can be estimated by the retest method. The 
stability and equivalence aspects of reliability 
gives an estimate of change over a period of 
time and the fluctuations between two supposed- 
ly equivalent tests. The use of a particular type 
of reliability estimate would depend upon the 
situation. In the objective measures of temper- 
ament the primary interest is in whether or not 
the particular behavior sampled at the time of 
testing is typical of that same behavior at some 
later time. This would mean that the coeffic- 
ient of stability obtained by a retest after a 
lapse of time would be considered. It did not 
seem feasible, at the time of testing, to obtain 
this reliability estimate. 
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TABLE V 
INTERCORRELATIONS OF CRITERIA RATINGS 
(N = 34) 
Correlation 
Rating 1 2 3 4 

Principal’s Acceptability ous 
Principal’s M-Blank . 93 nes 
Rater Number 1 . a . 55 — 
Rater Number 2 . 64 .57 . 73 eke 
Principal’s M-Blank 2 . 50 . 53 . 39 . 38 

TABLE VI 


RELATIONSHIP BETWEEN THE FOUR RATINGS IN COM- 
POSITE ONE AND THE FIRST GENERAL FACTOR 





General Factor 








Rating Loading 
Principal’s Acceptability . 94 
Principal’s M-Blank . 87 
Rater 1 . 85 
Rater 2 . 84 

TABLE VI 


RELATIONSHIP BETWEEN THE FIVE RATINGS IN COMPOSITE 
TWO AND THE FIRST GENERAL FACTOR 





General Factor 





Rating Loading 
Principal’s Acceptability . 92 
Principal’s M-Blank . 87 
Rater 1 . 83 
Rater 2 .81 


Principal’s M-Blank 2 . 68 
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Anumber of reliability estimates canbe found 
in the literature which fall under the general 
classification of mechanical and motor tests. A 
number of these tests are of the same type used 
in this investigation. 

Woodworth (44), for example, reports the 
retest reliability of reaction time as being about 
.90 when given under the same conditions as the 
test for reaction time in this investigation. 
Greene (18) reports on the use of a tapping test 
with a maximum retest correlation of about .91. 
Several batteries of motor tests are further de- 
scribed by Greene. These batteries include sub- 
tests of the kind used in this investigationas ob- 
jective indicators of temperament. One ofthese, 
for example, the Stanford Motor Skills Tests (18) 
has reported retest reliabilities of .75 to .86 
for each item in the battery. 

Research studies in the areas of tempo, flu- 
ency, suggestibility and disposition rigidity are 
not as numerous as are the studies in the area 
of mechanical and motor abilities. Subsequent- 
ly, reliability estimates are not reported as of- 
ten. 

Perhaps the most comprehensive study has 
been done in the area of tempo by Allport and 
Vernon (2). They report retest reliabilities, 
showing consistency of speed with a single task 
at different times, in such areas as reading, 
counting, and handwriting to be .88, .85, and 
.87, respectively. 

Although reliability estimates are reasonably 
high in the literature, any subsequent investiga- 
tion in this area must establish the retest relia- 
bilities of the objective measures of tempera- 
ment. 

A test is said to be valid when it measures 
what it purports to measure. The tests usedin 
this investigation as measures of tempera- 
ment purport to be valid indicators of tempera- 
ment. The conclusions may be questioned, but 
aside from their validity as temperamental in- 
dicators, their relationship to teaching success 
as it is defined in this investigation, whatever 
they may be measuring, is of interest. 


Reliability and Validity of Ratings 





No reliability estimates of the five ratings 
used in the criterion were obtained. The inter- 
correlations of these criterion ratings can be 
seen in Table V. 

As may be seen the Principals Acceptability 
and the Principals M-Blank rating show a cor- 
relation of .93. These two ratings were ob- 
tained in the first year of teaching at different 
times, with different instruments, and with the 
same individual giving both ratings. Itseems 
reasonable to assume that had the same instru- 
ment been used in both ratings that such a reli- 
ability estimate would have been higher than 
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the .93 correlation. Rater one and two, using 
the same instrument at the same time, correl- 
ated .73. It can be seen that the relationship 
between rater one and rater two, and the rela- 
tionship between the Principal’s Acceptabil- 
ity and the Principal’s M-Blank rating, is great- 
er than the relationship of the two principals’ 
ratings with either of the ratings one and two. 

The Principal’s M-Blank rating number 
two which was obtained in the second year of 
teaching with a different instrument, and with 
the same rater as in the Principal’s Acceptabil- 
ity and the Principal’s M-Blank rating, shows a 
relationship with the Principal’s M-Blank and 
Principal’s Acceptability of .53 and .50 respec- 
tively. It is not known, though, whether the 
change in rating is the result of a different opin- 
ion expressed by the individual giving the 
rating, by a change in the individual being rated 
or because of a difference in instruments. 


Treatment of the Data 





The raw scores of the objective measures 
of temperament, of the Thurstone Temperament 
Schedule, and of the Cattell 16 P.F. Test 
were used inall calculations. The rating scores 
were treatedas composites. Intercorrela- 
tions between the various criterion scores are 
given in Table V. Two composite scores were 
constructed. The first wasa summation of 
the four ratings obtained in the subjects first 
year of teaching: the Principal’s Acceptability 
rating, the Principal’s M-Blank rating, Rater 
land2. The second composite was construct- 
ed around a fewer number of cases inasmuch as 
in the second year of teaching nine subjects were 
no longer teaching. It consisted of the four rat- 
ings which composed composite one plus the ad- 
dition of a second Principal’s M~-Blank rat- 
ing. The criterion scores, hereafter, will be 
referred to as composite one and composite 
two. 

Further relationships between the individual 
ratings are given in Tables VI and VII. 

Since the correlations between the individual 
ratings and the general factor are all rela- 
tively high the ratings canessentially be 
considered as measuring the same thing. 

Throughout the study the Pearson Pro- 
duct Moment Coefficient of Correlation has been 
employed to determine the amount of relation- 
ship between measures of temperament and 
the criteria. 

To determine the level of significance 
of a coefficient of correlation tables in Lindquist 
(27) were used. The tables give significance at 
the five and one percent levels for samples of 
various size. 
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TABLE VIII 


CORRELATIONS BETWEEN THURSTONE TEMPERAMENT 
SCHEDULE AND COMPOSITE ONE AND TWO 











Temperament Composite Composite 
Trait No. 1 No. 2 
A .01 . 05 
V -.23 ~.29 
I -. 05 -.10 
D . 26 .14 
E -. 03 -. 08 
s . 05 -.16 
R . 03 oat 
TABLE KX 


CORRELATIONS BETWEEN THE CATTELL 16 P. F. TEST 
AND COMPOSITE ONE AND TWO 











Trait Composite Composite 
Trait No. 1 No. 2 
A 27 . 40 
B 11 .18 
Cc 26 .20 
E 03 .10 
F -.01 10 
G -.21 =. ii 
H -.15 . 02 
I 21 .14 
L . 06 «11 
M -.12 =. G1 
N 14 . 30 
oO -. 01 -. 01 
Q, 21 . 32 
Q2 -. i) . 00 
Q; -. 02 -, 08 
Q, -. 03 06 





Z 
wo 
a 
i) 
3] 
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TABLE X 


CORRELATIONS BETWEEN THE OBJECTIVE MEASURES OF 


TEMPERAMENT AND COMPOSITE ONE AND TWO 











Correlation* 
Composite Composite 
Measure No. 1 No. 2 

Tempo Reading .10 . 03 
Tempo Writing . 03 . 00 
Tempo Counting . 06 . 23 
Tempo Tapping .10 .14 
Tempo Card Sorting . 06 .31 
Speed of Decision . 05 . 00 
Speed Tapping . 45 . 47 
Speed Card Sorting . 08 . 26 
Digit Symbol .17 . 36 
Reaction Time .12 -11 
Fluency (Adjectives) .41 .28 
Fluency (Things Round) . 09 . 03 
Fluency (First Letter) .29 .21 
Two Hand Coordination .13 .21 
Right and Left Hand Coordination .57 .47 
Sway (Total Amount) . 02 .20 
Disposition Rigidity (Subtest 5a) . 07 .14 
Disposition Rigidity (Subtest 5b) .20 .19 
Disposition Rigidity (Subtest 5c) -1l . 08 
N 34 26 





*All signs changed to indicate positive relationship 


TABLE XI 


CORRELATIONS BETWEEN COMBINATIONS AND PARTS OF 


THE OBJECTIVE MEASURES OF TEMPERAMENT AND 


COMPOSITE ONE AND TWO 








Correlation* 
Composite Composite 
Measure No. 1 No. 2 
Maze Number 1 (Subtest 5c) . 42 . 36 


Deviation of Reaction Time 
from Median in Positive 
Direction .51 .39 





N 34 26 





*All signs changed to indicate positive relationship. 
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SECTION Ii 
ANALYSIS OF THE DATA 


Correlations Between Thurstone Temperament 
Schedule and Composite One and Two 








THE CALCULATED correlations for the 
seven areas of temperament in the Thurstone 
Temperament Schedule, showing their relation- 
ship to composite one and two, are given in Table 
Vill. The raw scores for the Schedule are given 
in Table IX. None of the correlations differ 
greatly from zero. 


Correlations Between the Cattell 16 P. F. Test 
and Composite One and Two 








The correlations for this assessment of total 
personality are shown in Table IX. The raw 
scores for the 16 P. F. Test are given in Table 
XVII. None of the correlations except the . 40 
correlation between Factor A and composite two 
differ greatly from zero. The correlations be- 
tween Factors N and Q, and composite two were 
.30 and .32, respectively. 


Correlations Between the Objective Measures 
of Temperament and Composite One and Two 








The correlations for the objective measures 
of temperament and the composites are present- 
edin Table X. Raw scores of the objective 
measures are shown in Table XVIII. 

The correlations of Speed of Tapping (. 45), 
Right and Left Hand Coordination (.57), and 
Fluency (. 41), with composite one are among 
the larger coefficients of correlation. 

The correlations of Speed of Tapping (. 47) 
and Right and Left Hand Coordination (. 47) with 
composite two approximate those with compos- 
ite one. The correlation between Fluency (.28) 
and composite two, while positive, is somewhat 
less. 


Correlations Between Combinations and Parts 
of the Objective Measures of Temperament 
and Composite One and Two 











In Section fi it was suggested that the relation- 
ship between several of the objective measures 
might be significant in any assessment of temp- 
erament. As, for example, the relationship be- 
tween speed, tempo, and fluency as possible 
temperamental indicators. Following this lead 
several combinations of measures were made. 
Two of them yielded statistically significant cor- 
relations. These are presented in Table XI. 
The raw scores for these measures are present- 
ed in Table XIX. 

These correlations are consistent for both 
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composites, except for Maze Number 1 and 
composite two, where r = . 36. 


Intercorrelations of Objective Measures Signif- 
icantly Correlated with Criteria 











From an inspection of Tables X and XI, it 
may be seen that five of the objective measures 
of temperament show a significant relationship 
with the criteria, that is, if one is willing to 
look upon the group here studied as randomly 
drawn, which strictly speaking, itis not. A 
summary of these measures is presented in 
Table XII. 


The intercorrelations for the five objective 
measures showing a significant relationship 
with the criteria were computed separately for 
each composite, since the number of cases in 
each differ. 

For composite one the intercorrelations are 
presented in Table XII. 

A multiple correlation was then computed 
using the five objective measures of Speed and 
Tapping, the Deviation of Reaction Time ina 
Positive Direction, Fluency (number of adjec- 
tives describing a house), Right and Left Hand 
Coordination, and Maze Number 1. Using the 
Aitkin Modified Method of Pivotal Condensation, 
as the means of calculation, the value of this 
multiple correlation was. 734 with an expected 
shrinkage to . 69 (29). 

The intercorrelations of the five objective 
measures for the twenty-six cases which com- 
prise Composite two are presented in Table XIV. 

Using the same type of procedure as above 
a multiple correlation was computed, with a 
value of .66andan expected shrinkageto 


-596. 
The multiple correlation coefficient for com- 


posite two would ordinarily be expected to be 
lower than the multiple correlation for compos- 
ite one since the range in the criterion scores 
in the second composite is less than in compos- 
ite one. The standard deviation for composite 
one is .92, for composite two it is .71. It may 
be recalled that composite two is the criterion 
score for those subjects in their second year 
of teaching and that nine subjects from the or- 
iginal test group were no longer teaching in the 
second year. Six of these nine subjects were 
below the general mean of the criterion scores. 
Accordingly, the subjects of composite two 
were more homogeneous with respect to the 
criteria ratings and the objective measures of 
temperament, in general, were less discrim- 
inatory for this more homogeneous group. 
Factor A of the Cattell 16 P. F. Test correl- 
ated . 40 with the composite two criteria. This 
factor was included in a multiple correlation 
with the five objective measures. It raisedthe 
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TABLE xi 


OBJECTIVE MEASURES SIGNIFICANTLY CORRELATED 
WITH CRITERIA 





Composite Composite 








Measure No. 1 No. 2 
Speed Tapping . 45* . 4T** 
Deviation of Reaction Time in 

a Positive Direction . 51* . 39** 
Fluency (Adjectives) . 41** . 28 
Right and Left Hand Coordination . S7* . 47** 
Maze Number 1 . 42** . 36 

N 34 26 





* Significant at 1% level 
**Significant at 5% level 


TABLE XI 
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INTERCORRELATIONS OF THE FIVE OBJECTIVE MEASURES SIGNIFICANTLY 
CORRELATED WITH CRITERIA (COMPOSITE ONE) 








(N = 35) 
Correlation* 
Measure 1 2 3 4 
Speed Tapping 
Deviation of Reaction Time in a Positive 
Direction .15 ae 

Fluency (Adjectives) .19 . 08 as 
Right and Left Hand Coordination .51 37 . 28 se 
Maze Number 1 . 35 .37 . 28 . 32 





*Signs changed to indicate positive relationship 
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TABLE XIV 


INTERCORRELATIONS OF THE FIVE OBJECTIVE MEASURES SIGNIFICANTLY 
CORRELATED WITH CRITERIA (COMPOSITE TWO) 











(N = 26) 
Correlation 
Measure 1 2 3 4 
Speed Tapping 
Deviation of Reaction Time in a Positive 
Direction . 05 os 
Fluency (Adjectives) .12 .01 be 
Right and Left Hand Coordination . 48 .21 .20 Jee 
Maze Number 1 .37 .23 .28 okt 
TABLE XV 


CORRELATION BETWEEN THE FIVE OBJECTIVE MEAS- 
URES OF TEMPERAMENT AND FACTOR A OF THE 
CATTELL 16 P. F. TEST 








Correlation* 
Objective Measures Factor A 

Speed Tapping . 04 
Deviation of Reaction Time in a Positive 

Direction . 04 
Fluency (Adjectives) .20 
Right and Left Hand Coordination . 48 
Maze Number 1 . 07 





*All signs changed to indicate positive relationship 
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value of the multiple correlation coefficient to 
71. 

Although Factor A, when added to the objec- 
tive measures as a predictive factor, does raise 
the multiple correlation such a gain is not made 
without losses. 

The relationship of Factor A is presented in 
Tabie XV. 


Study of Individual Cases 





Material for representative case studies was 
available in the data booklet* used at the Univer- 
sity of Wisconsin each year to collect informa- 
tion from students preparing for the teaching 
profession. 

Six case studies were made, three ranked in 
the first quartile on the criteria scores (Subjects 
5, 21, 24) and the remaining three ranked in the 
fourth quartile (Subjects 4, 28, 35). 

It was felt, from the data available, that the 
following areas would be of importance: sex, 
marital status, data on size of the family, the 
parents educational experiences, number and 
kinds of educational awards, the reasons for en- 
tering the teaching profession, any work exper- 
ience during college, work experience during 
vacation periods, all extra-curricular and other 
activities participated in by subject, and the sub- 
ject’s answer to the question of what he would 
do if he had plenty of money. 

In order to make the data more accessable 
for comparison, the case history of each subject 
will be presented in graph form on the next few 
pages. 

In reviewing the case studies of those sub- 
jects ranked in the first and fourth quartile there 
does not seem to be any clear-cut distinguishing 
areas, It may be significant to note, however, 
that in general those ranked in the upper quar- 
tile tended to place teaching near the top intheir 
choice, when asked the question of what they 
would do if money were no object; those in the 
lower quartile tended to place teaching near the 
bottom of the list, or as in two cases used in 
this investigation, eliminated it entirely. 

There does seem to be some evidence to in- 
dicate that those teachers ranked in the upper 
quartile list more extra-curricular activities 
than do those in the lower quartile. 

With the exception of the above areas there 
did not appear to be any clear cut differences 
between upper and lower quartile teachers 
in the data available for the case studies. 
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SECTION IV 
SUMMARY AND CONCLUSIONS 


Statement of the Problem 





THE PURPOSE of this investigation is 
to study the relationship between certainaspects 
of temperament and teaching success. 


Design of the Study 


The subjects of this investigation were thirty- 
five students who had graduated from the Uni- 
versity of Wisconsin and who had secured teach- 
ing positions in secondary schools of Wisconsin 
in the fall of 1950. 

Aspects of temperament were measured by 
(a) the Thurstone Temperament Schedule, (b) 
the Cattell 16 Personality Factor Test, (c) a 
series of objective tests of tempo, fluency, 
speed, suggestibility, disposition rigidity, and 
dexterity—coordination. 

The criteria of teaching success was develop- 
ed from five ratings of each teacher: (a) the 
principal’s acceptability, (b) the principal’s M- 
Blank, (c) a rating given by one of acore group 
of graduate students trained in the visitation 
program and who visited the teacher inhis class- 
room, (d) a rating given by a staff member of 
the University of Wisconsin, and (e) a second 
M-Blank rating given by the principal in the sec- 
ond year of teaching. 

Using a composite score for the criteria, 
and the raw score from the measures of temp- 
erament, Pearson product-moment correlations 
were calculated to determine the relationship 
between the aspects of temperament and teacia- 
ing success. 


Summary of Findings 





No significant relationships were found be- 
tween the seven areas of temperament of the 
Thurstone Temperament Schedule and the cri- 
teria. 

In the Cattell 16 P. F. Test, only one factor, 
namely A, appeared significant. This was with 
composite two and was not consistent with cor- 
relation of composite one. When placed ina 
multiple correlation with measures found signif- 
icant in the objective tests it increased, how - 
ever, the value of the multiple correlationfrom 
.66 to . 714. 





#Teacher Personnel Research Committee, Data Booklet, School of Education, University of Wisconsin, 


Madison, Wisconsin. 
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Five of the objective measures of tempera- 
ment were found to be significantly correlated 
with the criteria, namely, (a) Speed of Tapping, 
(b) Deviation of the Reaction Time in a Positive 
Direction from the Median, (c) Fluency (adjec- 
tives), (d) Right and Left Hand Coordination, 
and (e) Maze Number 1. These correlations 
ranged from . 39 to .57. 

A multiple correlation of .734 was secured 
for composite one, and of . 66 for composite two 
when these five objective measures were em- 
ployed in a regression equation. 


Conclusions 


In light of the results of this investigation it 
seems reasonable to postulate that there may be 
certain temperamental patterns which will dis- 
tinguish between good and poor teachers as meas- 
ured by ratings of principals and others trained 
to evaluate teacher effectiveness. 

It would seem from the data in this investi- 
gation that identification of these temperamental 
patterns can best be accomplished through the 
use of objective measures. The Thurstone 
Temperament Schedule and the Cattell 16 P. F. 
Test seemingly fail to identify aspects of temp- 
eramental behvior which are related to success 
in teaching as measured in this investigation. 

It may be argued that the objective measures 
used in this investigation are restricted to meas- 
uring only what they literally measure in the 
particular test situation, that is, the Speed of 
Tapping Test measuring the speed of the subjects 
taoping in the particular situation in which the 
sample of behavior was observed. It seems, 
however, that further and broader interpreta- 
tions might be inferred from these test situa- 
tions. 

If one makes these interpretations a ‘‘good”’ 
teacher may be described as follows: The ‘‘good’’ 
teacher would seem to be characterizedas pos- 
sessing a particular type of fluency. He has 
the ability to associate ideas and things. He 
would seem to be able to maintain a definite 
mental set, that is, he can better concentrate, 
over a period of time, upon an assigned task, 
as evidenced through the small variation between 
his median score on the Reaction Time Test 
and the deviation of his reactions in a positive 
direction from his own median. Relatively, he 
would seem to show more interest and concern 
for the task at hand and seems to have a desire 
to work near the upper limits of his ability most 
of the time in accomplishing that task. Insome- 
what the same vein he may be characterized as 
having more of what can be termed ‘‘drive’’ or 
determination to succeed. Finally, the ‘‘good’’ 
teacher would seem to possess more speed in 
performing tasks of the psychomotor type. Al- 
though this would seem to be of temperamental 
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origin its temperamental correlates in obser- 
vable behavior are not clear. 
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